Updated version of pg_receivexlog

Started by Magnus Haganderover 14 years ago35 messageshackers
Jump to latest
#1Magnus Hagander
magnus@hagander.net

Here's an updated version of pg_receivexlog, that should now actually
work (it previously failed miserably when a replication record crossed
a WAL file boundary - something which I at the time could not properly
reproduce, but when I restarted my work on it now could easily
reproduce every time :D).

It also contains an update to pg_basebackup that allows it to stream
the transaction log in the background while the backup is running,
thus reducing the need for wal_keep_segments (if the client can keep
up, it should eliminate the need completely).

In doing so, it moves a number of functions from pg_basebackup.c to
the new file streamutil.c, to be shared between both pg_basebackup and
pg_receivexlog.

So far at least, it's completely client-side, with no changes to the
server. This means that it can be dropped into src/bin on 9.1 as well
to get a version that runs there (since we're way way way past feature
freeze and can't actually stick it in there in the official tree)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachments:

pg_receivexlog.difftext/x-patch; charset=US-ASCII; name=pg_receivexlog.diffDownload+1768-248
#2Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Magnus Hagander (#1)
Re: Updated version of pg_receivexlog

On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander <magnus@hagander.net> wrote:

Here's an updated version of pg_receivexlog, that should now actually
work (it previously failed miserably when a replication record crossed
a WAL file boundary - something which I at the time could not properly
reproduce, but when I restarted my work on it now could easily
reproduce every time :D).

It also contains an update to pg_basebackup that allows it to stream
the transaction log in the background while the backup is running,
thus reducing the need for wal_keep_segments (if the client can keep
up, it should eliminate the need completely).

reviewing this...

i found useful pg_receivexlog as an independent utility, but i'm not
so sure about the ability to call it from pg_basebackup via --xlog
option. this is because pg_receivexlog will continue streaming even
after pg_basebackup if it's called independently but not in the other
case so the use case for --xlog seems more narrow and error prone (ie:
you said that it reduces the need for wal_keep_segments *if the client
can keep up*... how can we know that before starting pg_basebackup?)

pg_receivexlog worked good in my tests.

pg_basebackup with --xlog=stream gives me an already recycled wal
segment message (note that the file was in pg_xlog in the standby):
FATAL: could not receive data from WAL stream: FATAL: requested WAL
segment 00000001000000000000005C has already been removed

haven't read all the code in the detail but seems fine to me

in other things:
do we need to include src/bin/pg_basebackup/.gitignore in the patch?

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#3Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Jaime Casanova (#2)
Re: Updated version of pg_receivexlog

On Wed, Sep 28, 2011 at 1:38 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander <magnus@hagander.net> wrote:

Here's an updated version of pg_receivexlog, that should now actually
work (it previously failed miserably when a replication record crossed
a WAL file boundary - something which I at the time could not properly
reproduce, but when I restarted my work on it now could easily
reproduce every time :D).

It also contains an update to pg_basebackup that allows it to stream
the transaction log in the background while the backup is running,
thus reducing the need for wal_keep_segments (if the client can keep
up, it should eliminate the need completely).

reviewing this...

btw, executing 'make world' with this patch gives me this error (seems
like an entry is missing in doc/src/sgml/ref/allfiles.sgml):

jade:reference.sgml:223:4:E: general entity "pgReceivexlog" not
defined and no default entity

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#4Magnus Hagander
magnus@hagander.net
In reply to: Jaime Casanova (#2)
Re: Updated version of pg_receivexlog

On Wed, Sep 28, 2011 at 08:38, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander <magnus@hagander.net> wrote:

Here's an updated version of pg_receivexlog, that should now actually
work (it previously failed miserably when a replication record crossed
a WAL file boundary - something which I at the time could not properly
reproduce, but when I restarted my work on it now could easily
reproduce every time :D).

It also contains an update to pg_basebackup that allows it to stream
the transaction log in the background while the backup is running,
thus reducing the need for wal_keep_segments (if the client can keep
up, it should eliminate the need completely).

reviewing this...

i found useful pg_receivexlog as an independent utility, but i'm not
so sure about the ability to call it from pg_basebackup via --xlog
option. this is because pg_receivexlog will continue streaming even
after pg_basebackup if it's called independently but not in the other
case so the use case for --xlog seems more narrow and error prone (ie:
you said that it reduces the need for wal_keep_segments *if the client
can keep up*... how can we know that before starting pg_basebackup?)

These two are not intended to be used together.

pg_basebackup --xlog=stream is intended for the same use-case as
"pg_basebackup -x" today, which is take a backup of just the parts
that you actually need to clone the database, but to do so without
having to guestimate the value for wal_keep_segments.

pg_receivexlog worked good in my tests.

pg_basebackup with --xlog=stream gives me an already recycled wal
segment message (note that the file was in pg_xlog in the standby):
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
segment 00000001000000000000005C has already been removed

Do you get this reproducibly? Or did you get it just once?

And when you say "in the standby" what are you referring to? There is
no standby server in the case of pg_basebackup --xlog=stream, it's
just backup... But are you saying pg_basebackup had received the file,
yet tried to get it again?

in other things:
do we need to include src/bin/pg_basebackup/.gitignore in the patch?

Not sure what you mean? We need to add pg_receivexlog to this file,
yes - in head it just contains pg_basebackup.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#5Magnus Hagander
magnus@hagander.net
In reply to: Jaime Casanova (#3)
Re: Updated version of pg_receivexlog

On Wed, Sep 28, 2011 at 09:30, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Wed, Sep 28, 2011 at 1:38 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander <magnus@hagander.net> wrote:

Here's an updated version of pg_receivexlog, that should now actually
work (it previously failed miserably when a replication record crossed
a WAL file boundary - something which I at the time could not properly
reproduce, but when I restarted my work on it now could easily
reproduce every time :D).

It also contains an update to pg_basebackup that allows it to stream
the transaction log in the background while the backup is running,
thus reducing the need for wal_keep_segments (if the client can keep
up, it should eliminate the need completely).

reviewing this...

btw, executing 'make world' with this patch gives me this error (seems
like an entry is missing in doc/src/sgml/ref/allfiles.sgml):

jade:reference.sgml:223:4:E: general entity "pgReceivexlog" not
defined and no default entity

Ugh, how did I miss that. You need this:

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8a8616b..382d297 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgCtl              SYSTEM "pg_ctl-ref.sgml">
 <!ENTITY pgDump             SYSTEM "pg_dump.sgml">
 <!ENTITY pgDumpall          SYSTEM "pg_dumpall.sgml">
+<!ENTITY pgReceivexlog      SYSTEM "pg_receivexlog.sgml">
 <!ENTITY pgResetxlog        SYSTEM "pg_resetxlog.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY postgres           SYSTEM "postgres-ref.sgml">

I think I broke it in a merge at some point..
--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#6Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Magnus Hagander (#4)
Re: Updated version of pg_receivexlog

On Wed, Sep 28, 2011 at 12:50 PM, Magnus Hagander <magnus@hagander.net> wrote:

pg_receivexlog worked good in my tests.

pg_basebackup with --xlog=stream gives me an already recycled wal
segment message (note that the file was in pg_xlog in the standby):
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
segment 00000001000000000000005C has already been removed

Do you get this reproducibly? Or did you get it just once?

And when you say "in the standby" what are you referring to? There is
no standby server in the case of pg_basebackup --xlog=stream, it's
just backup... But are you saying pg_basebackup had received the file,
yet tried to get it again?

ok, i was trying to setup a standby server cloning with
pg_basebackup... i can't use it that way?

the docs says:
"""
If this option is specified, it is possible to start a postmaster
directly in the extracted directory without the need to consult the
log archive, thus making this a completely standalone backup.
"""

it doesn't say that is not possible to use this for a standby
server... probably that's why i get the error i put a recovery.conf
after pg_basebackup finished... maybe we can say that more loudly?

in other things:
do we need to include src/bin/pg_basebackup/.gitignore in the patch?

Not sure what you mean? We need to add pg_receivexlog to this file,
yes - in head it just contains pg_basebackup.

your patch includes a modification in the file
src/bin/pg_basebackup/.gitignore, maybe i'm just being annoying
besides is a simple change... just forget that...

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#7Magnus Hagander
magnus@hagander.net
In reply to: Jaime Casanova (#6)
Re: Updated version of pg_receivexlog

On Thu, Sep 29, 2011 at 01:55, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Wed, Sep 28, 2011 at 12:50 PM, Magnus Hagander <magnus@hagander.net> wrote:

pg_receivexlog worked good in my tests.

pg_basebackup with --xlog=stream gives me an already recycled wal
segment message (note that the file was in pg_xlog in the standby):
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
segment 00000001000000000000005C has already been removed

Do you get this reproducibly? Or did you get it just once?

And when you say "in the standby" what are you referring to? There is
no standby server in the case of pg_basebackup --xlog=stream, it's
just backup... But are you saying pg_basebackup had received the file,
yet tried to get it again?

ok, i was trying to setup a standby server cloning with
pg_basebackup... i can't use it that way?

the docs says:
"""
If this option is specified, it is possible to start a postmaster
directly in the extracted directory without the need to consult the
log archive, thus making this a completely standalone backup.
"""

it doesn't say that is not possible to use this for a standby
server... probably that's why i get the error i put a recovery.conf
after pg_basebackup finished... maybe we can say that  more loudly?

The idea is, if you use it with -x (or --xlog), it's for taking a
backup/clone, *not* for replication.

If you use it without -x, then you can use it as the start of a
replica, by adding a recovery.conf.

But you can't do both at once, that will confuse it.

in other things:
do we need to include src/bin/pg_basebackup/.gitignore in the patch?

Not sure what you mean? We need to add pg_receivexlog to this file,
yes - in head it just contains pg_basebackup.

your patch includes a modification in the file
src/bin/pg_basebackup/.gitignore, maybe i'm just being annoying
besides is a simple change... just forget that...

Well, it needs to be included inthe commit, and if I exclude it inthe
posted patch, I'll just forget it in the end :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Magnus Hagander (#7)
Re: Updated version of pg_receivexlog
+ 		/*
+ 		 * Looks like an xlog file. Parse it's position.

s/it's/its/

+ 		 */
+ 		if (sscanf(dirent->d_name, "%08X%08X%08X", &tli, &log, &seg) != 3)
+ 		{
+ 			fprintf(stderr, _("%s: could not parse xlog filename \"%s\"\n"),
+ 					progname, dirent->d_name);
+ 			disconnect_and_exit(1);
+ 		}
+ 		log *= XLOG_SEG_SIZE;

That multiplication by XLOG_SEG_SIZE could overflow, if logid is very
high. It seems completely unnecessary, anyway,

s/IDENFITY_SYSTEM/IDENTIFY_SYSTEM/ (two occurrences)

In pg_basebackup, it would be a good sanity check to check that the
systemid returned by IDENTIFY_SYSTEM in the main connection and the
WAL-streaming connection match. Just to be sure that some connection
pooler didn't hijack one of the connections and point to a different
server. And better check timelineid too while you're at it.

How does this interact with synchronous replication? If a base backup
that streams WAL is in progress, and you have synchronous_standby_names
set to '*', I believe the in-progress backup will count as a standby for
that purpose. That might give a false sense of security.
synchronous_standby_names='*' is prone to such confusion in general, but
it seems that it's particularly surprising if a running pg_basebackup
lets a commit in synchronous replication to proceed. Maybe we just need
a warning in the docs. I think we should advise that
synchronous_standby_names='*' is dangerous in general, and cite this as
one reason for that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#8)
Re: Updated version of pg_receivexlog

On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

+               /*
+                * Looks like an xlog file. Parse it's position.

s/it's/its/

+                */
+               if (sscanf(dirent->d_name, "%08X%08X%08X", &tli, &log,
&seg) != 3)
+               {
+                       fprintf(stderr, _("%s: could not parse xlog
filename \"%s\"\n"),
+                                       progname, dirent->d_name);
+                       disconnect_and_exit(1);
+               }
+               log *= XLOG_SEG_SIZE;

That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
It seems completely unnecessary, anyway,

How do you mean completely unnecessary? We'd have to change the points
that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
way to get around the overflow, but I'm not sure that's what you mean?

s/IDENFITY_SYSTEM/IDENTIFY_SYSTEM/ (two occurrences)

Oops.

In pg_basebackup, it would be a good sanity check to check that the systemid
returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
connection match. Just to be sure that some connection pooler didn't hijack
one of the connections and point to a different server. And better check
timelineid too while you're at it.

That's a good idea. Will fix.

How does this interact with synchronous replication? If a base backup that
streams WAL is in progress, and you have synchronous_standby_names set to
'*', I believe the in-progress backup will count as a standby for that
purpose. That might give a false sense of security.

Ah yes. Did not think of that. Yes, it will have this problem.

synchronous_standby_names='*' is prone to such confusion in general, but it
seems that it's particularly surprising if a running pg_basebackup lets a
commit in synchronous replication to proceed. Maybe we just need a warning
in the docs. I think we should advise that synchronous_standby_names='*' is
dangerous in general, and cite this as one reason for that.

Hmm. i think this is common enough that we want to make sure we avoid
it in code.

Could we pass a parameter from the client indicating to the master
that it refuses to be a sync slave? An optional keyword to the
START_REPLICATION command, perhaps?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#10Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Magnus Hagander (#9)
Re: Updated version of pg_receivexlog

On Mon, Oct 24, 2011 at 7:40 AM, Magnus Hagander <magnus@hagander.net> wrote:

synchronous_standby_names='*' is prone to such confusion in general, but it
seems that it's particularly surprising if a running pg_basebackup lets a
commit in synchronous replication to proceed. Maybe we just need a warning
in the docs. I think we should advise that synchronous_standby_names='*' is
dangerous in general, and cite this as one reason for that.

Hmm. i think this is common enough that we want to make sure we avoid
it in code.

Could we pass a parameter from the client indicating to the master
that it refuses to be a sync slave? An optional keyword to the
START_REPLICATION command, perhaps?

can't you execute "set synchronous_commit to off/local" for this connection?

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#11Magnus Hagander
magnus@hagander.net
In reply to: Jaime Casanova (#10)
Re: Updated version of pg_receivexlog

On Mon, Oct 24, 2011 at 16:12, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Mon, Oct 24, 2011 at 7:40 AM, Magnus Hagander <magnus@hagander.net> wrote:

synchronous_standby_names='*' is prone to such confusion in general, but it
seems that it's particularly surprising if a running pg_basebackup lets a
commit in synchronous replication to proceed. Maybe we just need a warning
in the docs. I think we should advise that synchronous_standby_names='*' is
dangerous in general, and cite this as one reason for that.

Hmm. i think this is common enough that we want to make sure we avoid
it in code.

Could we pass a parameter from the client indicating to the master
that it refuses to be a sync slave? An optional keyword to the
START_REPLICATION command, perhaps?

can't you execute "set synchronous_commit to off/local" for this connection?

This is a walsender connection, it doesn't take SQL. Plus it's the
receiving end, and SET sync_commit is for the sending end.

that said, we are reasonably safe in current implementations, because
it always sets the flush location to invalidxlogptr, so it will not be
considered for sync slave. Should we ever start accepting "write" as
the point to sync against, the problem will show up, of course.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#12Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#9)
Re: Updated version of pg_receivexlog

On Mon, Oct 24, 2011 at 14:40, Magnus Hagander <magnus@hagander.net> wrote:

On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

+               /*
+                * Looks like an xlog file. Parse it's position.

s/it's/its/

+                */
+               if (sscanf(dirent->d_name, "%08X%08X%08X", &tli, &log,
&seg) != 3)
+               {
+                       fprintf(stderr, _("%s: could not parse xlog
filename \"%s\"\n"),
+                                       progname, dirent->d_name);
+                       disconnect_and_exit(1);
+               }
+               log *= XLOG_SEG_SIZE;

That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
It seems completely unnecessary, anyway,

How do you mean completely unnecessary? We'd have to change the points
that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
way to get around the overflow, but I'm not sure that's what you mean?

Talked to Heikki on IM about this one, turns out we were both wrong.
It's needed, but there was a bug hiding under it, due to (once again)
mixing up segments and offsets. Has been fixed now.

In pg_basebackup, it would be a good sanity check to check that the systemid
returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
connection match. Just to be sure that some connection pooler didn't hijack
one of the connections and point to a different server. And better check
timelineid too while you're at it.

That's a good idea. Will fix.

Added to the new version of the patch.

How does this interact with synchronous replication? If a base backup that
streams WAL is in progress, and you have synchronous_standby_names set to
'*', I believe the in-progress backup will count as a standby for that
purpose. That might give a false sense of security.

Ah yes. Did not think of that. Yes, it will have this problem.

Actually, thinking more, per other mail, it won't. Because it will
never report that the data is synced to disk, so it will not be
considered for sync standby.

This is something we might consider in the future (it could be a
reasonable scenario where you had this), but not in the first version.

Updated version of the patch attached.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachments:

pg_receivexlog2.difftext/x-patch; charset=US-ASCII; name=pg_receivexlog2.diffDownload+1829-248
#13Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#12)
Re: Updated version of pg_receivexlog

On Tue, Oct 25, 2011 at 12:37, Magnus Hagander <magnus@hagander.net> wrote:

On Mon, Oct 24, 2011 at 14:40, Magnus Hagander <magnus@hagander.net> wrote:

On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

+               /*
+                * Looks like an xlog file. Parse it's position.

s/it's/its/

+                */
+               if (sscanf(dirent->d_name, "%08X%08X%08X", &tli, &log,
&seg) != 3)
+               {
+                       fprintf(stderr, _("%s: could not parse xlog
filename \"%s\"\n"),
+                                       progname, dirent->d_name);
+                       disconnect_and_exit(1);
+               }
+               log *= XLOG_SEG_SIZE;

That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
It seems completely unnecessary, anyway,

How do you mean completely unnecessary? We'd have to change the points
that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
way to get around the overflow, but I'm not sure that's what you mean?

Talked to Heikki on IM about this one, turns out we were both wrong.
It's needed, but there was a bug hiding under it, due to (once again)
mixing up segments and offsets. Has been fixed now.

In pg_basebackup, it would be a good sanity check to check that the systemid
returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
connection match. Just to be sure that some connection pooler didn't hijack
one of the connections and point to a different server. And better check
timelineid too while you're at it.

That's a good idea. Will fix.

Added to the new version of the patch.

How does this interact with synchronous replication? If a base backup that
streams WAL is in progress, and you have synchronous_standby_names set to
'*', I believe the in-progress backup will count as a standby for that
purpose. That might give a false sense of security.

Ah yes. Did not think of that. Yes, it will have this problem.

Actually, thinking more, per other mail, it won't. Because it will
never report that the data is synced to disk, so it will not be
considered for sync standby.

This is something we might consider in the future (it could be a
reasonable scenario where you had this), but not in the first version.

Updated version of the patch attached.

I've applied this version with a few more minor changes that Heikki found.

His comment about .partial files still applies, and I intend to
address this in a follow-up commit, along with some further
documentation enhancements.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#13)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#15Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#14)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 09:29, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#15)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Oct 27, 2011 at 09:29, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

No. I looked at that file after shutting down the master server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#16)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 09:46, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Oct 27, 2011 at 09:29, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

No. I looked at that file after shutting down the master server.

Ugh, in that case something is certainly wrong. There is nothing but
setting up some offset values between PQgetCopyData() and write()...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#17)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Oct 27, 2011 at 09:46, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Oct 27, 2011 at 09:29, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

No. I looked at that file after shutting down the master server.

Ugh, in that case something is certainly wrong. There is nothing but
setting up some offset values between PQgetCopyData() and write()...

When end-of-copy stream is found or an error happens, pg_receivexlog
exits without flushing outstanding WAL records. Which seems to cause
the problem I reported.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#19Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#18)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 10:12, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Oct 27, 2011 at 09:46, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Oct 27, 2011 at 09:29, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

No. I looked at that file after shutting down the master server.

Ugh, in that case something is certainly wrong. There is nothing but
setting up some offset values between PQgetCopyData() and write()...

When end-of-copy stream is found or an error happens, pg_receivexlog
exits without flushing outstanding WAL records. Which seems to cause
the problem I reported.

Not sure I follow. When we arrive at PQgetCopyData() there should be
nothing buffered, and if the end of stream happens there it returns
-1, and we exit, no? So where is the data that's lost?

I do realize we don't actually fsync() and close() in this case - is
that what you are referring to? But the data should already have been
write()d, so it should still be there, no?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#20Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#19)
Re: Updated version of pg_receivexlog

On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander <magnus@hagander.net> wrote:

Not sure I follow. When we arrive at PQgetCopyData() there should be
nothing buffered, and if the end of stream happens there it returns
-1, and we exit, no? So where is the data that's lost?

I do realize we don't actually fsync() and close() in this case - is
that what you are referring to? But the data should already have been
write()d, so it should still be there, no?

Oh, right. Hmm.. xlogdump might be the cause.

Though I've not read the code of xlogdump, I wonder if it gives up
outputting the contents of WAL file when it finds a partial WAL page...
This strikes me that recovery code has the same problem. No?
IOW, when a partial WAL page is found during recovery, I'm afraid
that page would not be replayed though it contains valid data.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#21Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#20)
#22Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#21)
#23Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#22)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#23)
#25Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#24)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#25)
#28Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#27)
#29Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Magnus Hagander (#25)
#30Fujii Masao
masao.fujii@gmail.com
In reply to: Magnus Hagander (#28)
#31Ants Aasma
ants.aasma@cybertec.at
In reply to: Magnus Hagander (#7)
#32Fujii Masao
masao.fujii@gmail.com
In reply to: Ants Aasma (#31)
#33Ants Aasma
ants.aasma@cybertec.at
In reply to: Fujii Masao (#32)
#34Magnus Hagander
magnus@hagander.net
In reply to: Ants Aasma (#33)
#35Ants Aasma
ants.aasma@cybertec.at
In reply to: Magnus Hagander (#34)