pg_receivexlog sometimes fails on first start

Started by Heikki Linnakangasabout 11 years ago3 messagesbugs
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

When starting pg_receivexlog for the first time, i.e. with an empty
directory, it sometimes fails like this:

pg_receivexlog: unexpected termination of replication stream: ERROR:
requested starting point 56/70000000 is ahead of the WAL flush position
of this server 56/6FFC4000

This is easier to reproduce if you have a steady stream of updates, with
no commits, running concurrently. A bulk load, for example.

The problem is that pg_receivexlog issues the IDENTIFY_SYSTEM command,
which returns the current WAL insert position. It then requests to start
streaming from that position. Or to be precise, from the beginning of
the WAL segment containing that position. That's wrong; the WAL might
not be flushed up to the insert position yet.

I think returning the insert position in IDENTIFY_SYSTEM is a bad idea.
We even mention in the documentation that that field is "useful to get a
known location in the transaction log where streaming can start", but
the insert position is wrong for that purpose.

Any objections to changing IDENTIFY_SYSTEM to return the flush position,
instead of insert position? We haven't heard any complaints from the
field about this, but IMHO that should also be back-patched. It's a
design bug, and the fix is simple and unlikely to cause any harm.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: pg_receivexlog sometimes fails on first start

On Fri, Feb 6, 2015 at 6:17 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

When starting pg_receivexlog for the first time, i.e. with an empty
directory, it sometimes fails like this:

pg_receivexlog: unexpected termination of replication stream: ERROR:
requested starting point 56/70000000 is ahead of the WAL flush position of
this server 56/6FFC4000

This is easier to reproduce if you have a steady stream of updates, with no
commits, running concurrently. A bulk load, for example.

The problem is that pg_receivexlog issues the IDENTIFY_SYSTEM command, which
returns the current WAL insert position. It then requests to start streaming
from that position. Or to be precise, from the beginning of the WAL segment
containing that position. That's wrong; the WAL might not be flushed up to
the insert position yet.

I think returning the insert position in IDENTIFY_SYSTEM is a bad idea. We
even mention in the documentation that that field is "useful to get a known
location in the transaction log where streaming can start", but the insert
position is wrong for that purpose.

Good catch.

Any objections to changing IDENTIFY_SYSTEM to return the flush position,
instead of insert position? We haven't heard any complaints from the field
about this, but IMHO that should also be back-patched. It's a design bug,
and the fix is simple and unlikely to cause any harm.

+1 with your idea.

Regards,

--
Fujii Masao

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#2)
Re: pg_receivexlog sometimes fails on first start

On Fri, Feb 6, 2015 at 4:02 AM, Fujii Masao wrote:

On Fri, Feb 6, 2015 at 6:17 AM, Heikki Linnakangas wrote

Any objections to changing IDENTIFY_SYSTEM to return the flush position,
instead of insert position? We haven't heard any complaints from the field
about this, but IMHO that should also be back-patched. It's a design bug,
and the fix is simple and unlikely to cause any harm.

+1 with your idea.

+1.
-- 
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs