pg_basebackup -x stream from the standby gets stuck

Started by Fujii Masaoalmost 14 years ago7 messages

masao.fujii@gmail.com

almost 14 years ago

1 attachment(s)

Hi,

http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/

=$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
xlog start point: 2/AC4E2600
pg_basebackup: starting background WAL receiver
692447/692447 kB (100%), 1/1 tablespace
xlog end point: 2/AC4E2600
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed

real 3m56.237s
user 0m0.224s
sys 0m0.936s

(time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)

The above article points out the problem of pg_basebackup from the standby:
when "-x stream" is specified, pg_basebackup from the standby gets stuck if
there is no traffic in the database.

When "-x stream" is specified, pg_basebackup forks the background process
for receiving WAL records during backup, takes an online backup and waits for
the background process to end. The forked background process keeps receiving
WAL records, and whenever it reaches end of WAL file, it checks whether it has
already received all WAL files required for the backup, and exits if yes. Which
means that at least one WAL segment switch is required for pg_basebackup with
"-x stream" option to end.

In the backup from the master, WAL file switch always occurs at both start and
end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
above logic works fine even if there is no traffic. OTOH, in the backup from the
standby, while there is no traffic, WAL file switch is not performed at all. So
in that case, there is no chance that the background process reaches end of WAL
file, check whether all required WAL arrives and exit. At the end, pg_basebackup
gets stuck.

To fix the problem, I'd propose to change the background process so that it
checks whether all required WAL has arrived, every time data is received, even
if end of WAL file is not reached. Patch attached. Comments?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

fix_backup_stuck_v1.patchtext/x-diff; charset=US-ASCII; name=fix_backup_stuck_v1.patchDownload

*** a/src/bin/pg_basebackup/pg_basebackup.c
--- b/src/bin/pg_basebackup/pg_basebackup.c
***************
*** 78,84 **** static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
  static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
  static void BaseBackup(void);
  
! static bool segment_callback(XLogRecPtr segendpos, uint32 timeline);
  
  #ifdef HAVE_LIBZ
  static const char *
--- 78,84 ----
  static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
  static void BaseBackup(void);
  
! static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline);
  
  #ifdef HAVE_LIBZ
  static const char *
***************
*** 129,136 **** usage(void)
  
  
  /*
!  * Called in the background process whenever a complete segment of WAL
!  * has been received.
   * On Unix, we check to see if there is any data on our pipe
   * (which would mean we have a stop position), and if it is, check if
   * it is time to stop.
--- 129,137 ----
  
  
  /*
!  * Called in the background process every time data is received.
!  * Also called when the streaming stops to check whether
!  * the current log segment can be treated as a complete one.
   * On Unix, we check to see if there is any data on our pipe
   * (which would mean we have a stop position), and if it is, check if
   * it is time to stop.
***************
*** 138,144 **** usage(void)
   * time to stop.
   */
  static bool
! segment_callback(XLogRecPtr segendpos, uint32 timeline)
  {
  	if (!has_xlogendptr)
  	{
--- 139,145 ----
   * time to stop.
   */
  static bool
! reached_end_position(XLogRecPtr segendpos, uint32 timeline)
  {
  	if (!has_xlogendptr)
  	{
***************
*** 231,237 **** LogStreamerMain(logstreamer_param * param)
  {
  	if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline,
  						   param->sysidentifier, param->xlogdir,
! 						   segment_callback, NULL, standby_message_timeout))
  
  		/*
  		 * Any errors will already have been reported in the function process,
--- 232,238 ----
  {
  	if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline,
  						   param->sysidentifier, param->xlogdir,
! 						   reached_end_position, reached_end_position, standby_message_timeout))
  
  		/*
  		 * Any errors will already have been reported in the function process,
*** a/src/bin/pg_basebackup/pg_receivexlog.c
--- b/src/bin/pg_basebackup/pg_receivexlog.c
***************
*** 71,77 **** usage(void)
  static bool
  segment_callback(XLogRecPtr segendpos, uint32 timeline)
  {
! 	if (verbose)
  		fprintf(stderr, _("%s: finished segment at %X/%X (timeline %u)\n"),
  				progname, segendpos.xlogid, segendpos.xrecoff, timeline);
  
--- 71,77 ----
  static bool
  segment_callback(XLogRecPtr segendpos, uint32 timeline)
  {
! 	if (verbose && segendpos.xrecoff % XLOG_SEG_SIZE == 0)
  		fprintf(stderr, _("%s: finished segment at %X/%X (timeline %u)\n"),
  				progname, segendpos.xlogid, segendpos.xrecoff, timeline);
  
***************
*** 82,88 **** segment_callback(XLogRecPtr segendpos, uint32 timeline)
  }
  
  static bool
! continue_streaming(void)
  {
  	if (time_to_abort)
  	{
--- 82,88 ----
  }
  
  static bool
! continue_streaming(XLogRecPtr segendpos, uint32 timeline)
  {
  	if (time_to_abort)
  	{
*** a/src/bin/pg_basebackup/receivelog.c
--- b/src/bin/pg_basebackup/receivelog.c
***************
*** 113,120 **** open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir, char *namebu
  	return f;
  }
  
  static bool
! close_walfile(int walfile, char *basedir, char *walname)
  {
  	off_t		currpos = lseek(walfile, 0, SEEK_CUR);
  
--- 113,126 ----
  	return f;
  }
  
+ /*
+  * Close the current WAL file, and rename it to the correct filename if it's complete.
+  *
+  * If segment_complete is true, rename the current WAL file even if we've not
+  * completed writing the whole segment.
+  */
  static bool
! close_walfile(int walfile, char *basedir, char *walname, bool segment_complete)
  {
  	off_t		currpos = lseek(walfile, 0, SEEK_CUR);
  
***************
*** 141,149 **** close_walfile(int walfile, char *basedir, char *walname)
  
  	/*
  	 * Rename the .partial file only if we've completed writing the
! 	 * whole segment.
  	 */
! 	if (currpos == XLOG_SEG_SIZE)
  	{
  		char		oldfn[MAXPGPATH];
  		char		newfn[MAXPGPATH];
--- 147,155 ----
  
  	/*
  	 * Rename the .partial file only if we've completed writing the
! 	 * whole segment or segment_complete is true.
  	 */
! 	if (currpos == XLOG_SEG_SIZE || segment_complete)
  	{
  		char		oldfn[MAXPGPATH];
  		char		newfn[MAXPGPATH];
***************
*** 206,211 **** localGetCurrentTimestamp(void)
--- 212,221 ----
   * return. As long as they return false, streaming will continue
   * indefinitely.
   *
+  * The segment_finish callback will also be called when the
+  * streaming will stop to check whether the current log segment
+  * can be treated as a complete one.
+  *
   * standby_message_timeout controls how often we send a message
   * back to the master letting it know our progress, in seconds.
   * This message will only contain the write location, and never
***************
*** 288,298 **** ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, char *sysi
  		/*
  		 * Check if we should continue streaming, or abort at this point.
  		 */
! 		if (stream_continue && stream_continue())
  		{
  			if (walfile != -1)
  				/* Potential error message is written by close_walfile */
! 				return close_walfile(walfile, basedir, current_walfile_name);
  			return true;
  		}
  
--- 298,310 ----
  		/*
  		 * Check if we should continue streaming, or abort at this point.
  		 */
! 		if (stream_continue && stream_continue(blockpos, timeline))
  		{
  			if (walfile != -1)
  				/* Potential error message is written by close_walfile */
! 				return close_walfile(walfile, basedir, current_walfile_name,
! 									 segment_finish != NULL ?
! 									 segment_finish(blockpos, timeline) : false);
  			return true;
  		}
  
***************
*** 486,492 **** ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, char *sysi
  			/* Did we reach the end of a WAL segment? */
  			if (blockpos.xrecoff % XLOG_SEG_SIZE == 0)
  			{
! 				if (!close_walfile(walfile, basedir, current_walfile_name))
  					/* Error message written in close_walfile() */
  					return false;
  
--- 498,504 ----
  			/* Did we reach the end of a WAL segment? */
  			if (blockpos.xrecoff % XLOG_SEG_SIZE == 0)
  			{
! 				if (!close_walfile(walfile, basedir, current_walfile_name, false))
  					/* Error message written in close_walfile() */
  					return false;
  
*** a/src/bin/pg_basebackup/receivelog.h
--- b/src/bin/pg_basebackup/receivelog.h
***************
*** 2,8 ****
  
  /*
   * Called whenever a segment is finished, return true to stop
!  * the streaming at this point.
   */
  typedef bool (*segment_finish_callback)(XLogRecPtr segendpos, uint32 timeline);
  
--- 2,10 ----
  
  /*
   * Called whenever a segment is finished, return true to stop
!  * the streaming at this point. Also called when the streaming
!  * stops to check whether the current log segment can be
!  * treated as a complete one.
   */
  typedef bool (*segment_finish_callback)(XLogRecPtr segendpos, uint32 timeline);
  
***************
*** 10,16 **** typedef bool (*segment_finish_callback)(XLogRecPtr segendpos, uint32 timeline);
   * Called before trying to read more data. Return true to stop
   * the streaming at this point.
   */
! typedef bool (*stream_continue_callback)(void);
  
  extern bool ReceiveXlogStream(PGconn *conn,
  							  XLogRecPtr startpos,
--- 12,18 ----
   * Called before trying to read more data. Return true to stop
   * the streaming at this point.
   */
! typedef bool (*stream_continue_callback)(XLogRecPtr segendpos, uint32 timeline);
  
  extern bool ReceiveXlogStream(PGconn *conn,
  							  XLogRecPtr startpos,

Magnus Hagander

magnus@hagander.net

almost 14 years ago

In reply to: Fujii Masao (#1)

Re: pg_basebackup -x stream from the standby gets stuck

On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/

=$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
xlog start point: 2/AC4E2600
pg_basebackup: starting background WAL receiver
692447/692447 kB (100%), 1/1 tablespace
xlog end point: 2/AC4E2600
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed

real 3m56.237s
user 0m0.224s
sys 0m0.936s

(time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)

The above article points out the problem of pg_basebackup from the standby:
when "-x stream" is specified, pg_basebackup from the standby gets stuck if
there is no traffic in the database.

When "-x stream" is specified, pg_basebackup forks the background process
for receiving WAL records during backup, takes an online backup and waits for
the background process to end. The forked background process keeps receiving
WAL records, and whenever it reaches end of WAL file, it checks whether it has
already received all WAL files required for the backup, and exits if yes. Which
means that at least one WAL segment switch is required for pg_basebackup with
"-x stream" option to end.

In the backup from the master, WAL file switch always occurs at both start and
end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
above logic works fine even if there is no traffic. OTOH, in the backup from the
standby, while there is no traffic, WAL file switch is not performed at all. So
in that case, there is no chance that the background process reaches end of WAL
file, check whether all required WAL arrives and exit. At the end, pg_basebackup
gets stuck.

To fix the problem, I'd propose to change the background process so that it
checks whether all required WAL has arrived, every time data is received, even
if end of WAL file is not reached. Patch attached. Comments?

This seems like a good thing in general.

Why does it need to modify pg_receivexlog, though? I thought only
pg_basebackup had tihs issue?

I guess it is because of the change of the API to
stream_continue_callback only? Looking at it after your patch,
stream_continue_callback and segment_finish_callback are the same.
Should we perhaps just fold them into a single
stream_continue_callback? Since you had to move the "detect segment
end" to the caller anyway?

Another question related to this - since we clearly don't need the
xlog switch in this case, should we make it conditional on the master
as well, so we don't switch unnecessarily there as well?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Fujii Masao

masao.fujii@gmail.com

almost 14 years ago

In reply to: Magnus Hagander (#2)

Re: pg_basebackup -x stream from the standby gets stuck

On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/

=$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
xlog start point: 2/AC4E2600
pg_basebackup: starting background WAL receiver
692447/692447 kB (100%), 1/1 tablespace
xlog end point: 2/AC4E2600
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed

real 3m56.237s
user 0m0.224s
sys 0m0.936s

(time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)

The above article points out the problem of pg_basebackup from the standby:
when "-x stream" is specified, pg_basebackup from the standby gets stuck if
there is no traffic in the database.

When "-x stream" is specified, pg_basebackup forks the background process
for receiving WAL records during backup, takes an online backup and waits for
the background process to end. The forked background process keeps receiving
WAL records, and whenever it reaches end of WAL file, it checks whether it has
already received all WAL files required for the backup, and exits if yes. Which
means that at least one WAL segment switch is required for pg_basebackup with
"-x stream" option to end.

In the backup from the master, WAL file switch always occurs at both start and
end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
above logic works fine even if there is no traffic. OTOH, in the backup from the
standby, while there is no traffic, WAL file switch is not performed at all. So
in that case, there is no chance that the background process reaches end of WAL
file, check whether all required WAL arrives and exit. At the end, pg_basebackup
gets stuck.

To fix the problem, I'd propose to change the background process so that it
checks whether all required WAL has arrived, every time data is received, even
if end of WAL file is not reached. Patch attached. Comments?

This seems like a good thing in general.

Why does it need to modify pg_receivexlog, though? I thought only
pg_basebackup had tihs issue?

I guess it is because of the change of the API to
stream_continue_callback only?

Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c.

But the reason why I changed segment_callback() in pg_receivexlog.c is not the
same. I did that because previously segment_finish_callback is called
only at the
end of WAL segment but in the patch it can be called at the middle of segment.
OTOH, segment_callback() must emit a verbose message only when current
WAL segment is complete. So I had to add the check of whether current WAL
segment is partial or complete into segment_callback().

Looking at it after your patch,
stream_continue_callback and segment_finish_callback are the same.
Should we perhaps just fold them into a single
stream_continue_callback? Since you had to move the "detect segment
end" to the caller anyway?

No. I think we cannot do that because in pg_receivexlog they are not the same.

Another question related to this - since we clearly don't need the
xlog switch in this case, should we make it conditional on the master
as well, so we don't switch unnecessarily there as well?

Maybe. At the end of backup, we force WAL segment switch, to ensure all required
WAL files have been archived. So theoretically if WAL archiving is not enabled,
we can skip WAL segment switch. But some backup tools might depend on this
behavior....

In standby-only backup, we always skip WAL segment switch. So there is
no guarantee
that all WAL files required for the backup are archived at the end of
backup. This
limitation is documented.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Magnus Hagander

magnus@hagander.net

almost 14 years ago

In reply to: Fujii Masao (#3)

Re: pg_basebackup -x stream from the standby gets stuck

On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/

=$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
xlog start point: 2/AC4E2600
pg_basebackup: starting background WAL receiver
692447/692447 kB (100%), 1/1 tablespace
xlog end point: 2/AC4E2600
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed

real 3m56.237s
user 0m0.224s
sys 0m0.936s

(time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)

The above article points out the problem of pg_basebackup from the standby:
when "-x stream" is specified, pg_basebackup from the standby gets stuck if
there is no traffic in the database.

When "-x stream" is specified, pg_basebackup forks the background process
for receiving WAL records during backup, takes an online backup and waits for
the background process to end. The forked background process keeps receiving
WAL records, and whenever it reaches end of WAL file, it checks whether it has
already received all WAL files required for the backup, and exits if yes. Which
means that at least one WAL segment switch is required for pg_basebackup with
"-x stream" option to end.

In the backup from the master, WAL file switch always occurs at both start and
end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
above logic works fine even if there is no traffic. OTOH, in the backup from the
standby, while there is no traffic, WAL file switch is not performed at all. So
in that case, there is no chance that the background process reaches end of WAL
file, check whether all required WAL arrives and exit. At the end, pg_basebackup
gets stuck.

To fix the problem, I'd propose to change the background process so that it
checks whether all required WAL has arrived, every time data is received, even
if end of WAL file is not reached. Patch attached. Comments?

This seems like a good thing in general.

Why does it need to modify pg_receivexlog, though? I thought only
pg_basebackup had tihs issue?

I guess it is because of the change of the API to
stream_continue_callback only?

Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c.

But the reason why I changed segment_callback() in pg_receivexlog.c is not the
same. I did that because previously segment_finish_callback is called
only at the
end of WAL segment but in the patch it can be called at the middle of segment.
OTOH, segment_callback() must emit a verbose message only when current
WAL segment is complete. So I had to add the check of whether current WAL
segment is partial or complete into segment_callback().

Yeah, I caught that.

Looking at it after your patch,
stream_continue_callback and segment_finish_callback are the same.
Should we perhaps just fold them into a single
stream_continue_callback? Since you had to move the "detect segment
end" to the caller anyway?

No. I think we cannot do that because in pg_receivexlog they are not the same.

But couldn't they be made the same by making the same check as you put
in for the verbose message above?

Another question related to this - since we clearly don't need the
xlog switch in this case, should we make it conditional on the master
as well, so we don't switch unnecessarily there as well?

Maybe. At the end of backup, we force WAL segment switch, to ensure all required
WAL files have been archived. So theoretically if WAL archiving is not enabled,
we can skip WAL segment switch. But some backup tools might depend on this
behavior....

I was thinking we could keep doing it in pg_stop_backup(), but avoid
doing it when using pg_basebackup only...

In standby-only backup, we always skip WAL segment switch. So there is
no guarantee
that all WAL files required for the backup are archived at the end of
backup. This
limitation is documented.

Right.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Magnus Hagander (#4)

1 attachment(s)

Re: pg_basebackup -x stream from the standby gets stuck

On Fri, Mar 2, 2012 at 2:26 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/

=$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
xlog start point: 2/AC4E2600
pg_basebackup: starting background WAL receiver
692447/692447 kB (100%), 1/1 tablespace
xlog end point: 2/AC4E2600
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed

real 3m56.237s
user 0m0.224s
sys 0m0.936s

(time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)

The above article points out the problem of pg_basebackup from the standby:
when "-x stream" is specified, pg_basebackup from the standby gets stuck if
there is no traffic in the database.

When "-x stream" is specified, pg_basebackup forks the background process
for receiving WAL records during backup, takes an online backup and waits for
the background process to end. The forked background process keeps receiving
WAL records, and whenever it reaches end of WAL file, it checks whether it has
already received all WAL files required for the backup, and exits if yes. Which
means that at least one WAL segment switch is required for pg_basebackup with
"-x stream" option to end.

In the backup from the master, WAL file switch always occurs at both start and
end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
above logic works fine even if there is no traffic. OTOH, in the backup from the
standby, while there is no traffic, WAL file switch is not performed at all. So
in that case, there is no chance that the background process reaches end of WAL
file, check whether all required WAL arrives and exit. At the end, pg_basebackup
gets stuck.

To fix the problem, I'd propose to change the background process so that it
checks whether all required WAL has arrived, every time data is received, even
if end of WAL file is not reached. Patch attached. Comments?

This seems like a good thing in general.

Why does it need to modify pg_receivexlog, though? I thought only
pg_basebackup had tihs issue?

I guess it is because of the change of the API to
stream_continue_callback only?

Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c.

But the reason why I changed segment_callback() in pg_receivexlog.c is not the
same. I did that because previously segment_finish_callback is called
only at the
end of WAL segment but in the patch it can be called at the middle of segment.
OTOH, segment_callback() must emit a verbose message only when current
WAL segment is complete. So I had to add the check of whether current WAL
segment is partial or complete into segment_callback().

Yeah, I caught that.

Looking at it after your patch,
stream_continue_callback and segment_finish_callback are the same.
Should we perhaps just fold them into a single
stream_continue_callback? Since you had to move the "detect segment
end" to the caller anyway?

No. I think we cannot do that because in pg_receivexlog they are not the same.

But couldn't they be made the same by making the same check as you put
in for the verbose message above?

While reviewing and cleaning this patch up a bit I noticed it actually
broke pg_receivexlog in the renaming.

Here is a new version of the patch, reworked based on the above so
we're down to a single callback. I moved the "rename last segment file
even if it's not complete" to be a parameter into ReceiveXlogStream()
instead of trying to overload a third functionality on the callback
(which is what broke pg_receivexlog).

How does this look? Have I overlooked any cases?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

xlog_stream2.patchapplication/octet-stream; name=xlog_stream2.patchDownload

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0289c4b..b349dee 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -78,7 +78,7 @@ static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
 static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
 static void BaseBackup(void);
 
-static bool segment_callback(XLogRecPtr segendpos, uint32 timeline);
+static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline);
 
 #ifdef HAVE_LIBZ
 static const char *
@@ -129,8 +129,7 @@ usage(void)
 
 
 /*
- * Called in the background process whenever a complete segment of WAL
- * has been received.
+ * Called in the background process every time data is received.
  * On Unix, we check to see if there is any data on our pipe
  * (which would mean we have a stop position), and if it is, check if
  * it is time to stop.
@@ -138,7 +137,7 @@ usage(void)
  * time to stop.
  */
 static bool
-segment_callback(XLogRecPtr segendpos, uint32 timeline)
+reached_end_position(XLogRecPtr segendpos, uint32 timeline)
 {
 	if (!has_xlogendptr)
 	{
@@ -231,7 +230,7 @@ LogStreamerMain(logstreamer_param * param)
 {
 	if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline,
 						   param->sysidentifier, param->xlogdir,
-						   segment_callback, NULL, standby_message_timeout))
+						   reached_end_position, standby_message_timeout, true))
 
 		/*
 		 * Any errors will already have been reported in the function process,
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 2134c87..067da59 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -43,7 +43,7 @@ volatile bool time_to_abort = false;
 static void usage(void);
 static XLogRecPtr FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline);
 static void StreamLog();
-static bool segment_callback(XLogRecPtr segendpos, uint32 timeline);
+static bool continue_streaming(XLogRecPtr segendpos, uint32 timeline, bool segment_finished);
 
 static void
 usage(void)
@@ -69,21 +69,12 @@ usage(void)
 }
 
 static bool
-segment_callback(XLogRecPtr segendpos, uint32 timeline)
+continue_streaming(XLogRecPtr segendpos, uint32 timeline, bool segment_finished)
 {
-	if (verbose)
+	if (verbose && segment_finished)
 		fprintf(stderr, _("%s: finished segment at %X/%X (timeline %u)\n"),
 				progname, segendpos.xlogid, segendpos.xrecoff, timeline);
 
-	/*
-	 * Never abort from this - we handle all aborting in continue_streaming()
-	 */
-	return false;
-}
-
-static bool
-continue_streaming(void)
-{
 	if (time_to_abort)
 	{
 		fprintf(stderr, _("%s: received interrupt signal, exiting.\n"),
@@ -268,8 +259,8 @@ StreamLog(void)
 				progname, startpos.xlogid, startpos.xrecoff, timeline);
 
 	ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
-					  segment_callback, continue_streaming,
-					  standby_message_timeout);
+					  continue_streaming,
+					  standby_message_timeout, false);
 
 	PQfinish(conn);
 }
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index b0cf836..5a77328 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -113,8 +113,14 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir, char *namebu
 	return f;
 }
 
+/*
+ * Close the current WAL file, and rename it to the correct filename if it's complete.
+ *
+ * If segment_complete is true, rename the current WAL file even if we've not
+ * completed writing the whole segment.
+ */
 static bool
-close_walfile(int walfile, char *basedir, char *walname)
+close_walfile(int walfile, char *basedir, char *walname, bool segment_complete)
 {
 	off_t		currpos = lseek(walfile, 0, SEEK_CUR);
 
@@ -141,9 +147,9 @@ close_walfile(int walfile, char *basedir, char *walname)
 
 	/*
 	 * Rename the .partial file only if we've completed writing the
-	 * whole segment.
+	 * whole segment or segment_complete is true.
 	 */
-	if (currpos == XLOG_SEG_SIZE)
+	if (currpos == XLOG_SEG_SIZE || segment_complete)
 	{
 		char		oldfn[MAXPGPATH];
 		char		newfn[MAXPGPATH];
@@ -199,11 +205,10 @@ localGetCurrentTimestamp(void)
  * All received segments will be written to the directory
  * specified by basedir.
  *
- * The segment_finish callback will be called after each segment
- * has been finished, and the stream_continue callback will be
- * called every time data is received. If either of these callbacks
- * return true, the streaming will stop and the function
- * return. As long as they return false, streaming will continue
+ * The stream_continue callback will be called every time data
+ * is received, and whenever a segment is completed. If it returns
+ * true, the streaming will stop and the function
+ * return. As long as it returns false, streaming will continue
  * indefinitely.
  *
  * standby_message_timeout controls how often we send a message
@@ -214,7 +219,7 @@ localGetCurrentTimestamp(void)
  * Note: The log position *must* be at a log segment start!
  */
 bool
-ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, char *sysidentifier, char *basedir, segment_finish_callback segment_finish, stream_continue_callback stream_continue, int standby_message_timeout)
+ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, char *sysidentifier, char *basedir, stream_continue_callback stream_continue, int standby_message_timeout, bool rename_partial)
 {
 	char		query[128];
 	char		current_walfile_name[MAXPGPATH];
@@ -288,11 +293,11 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, char *sysi
 		/*
 		 * Check if we should continue streaming, or abort at this point.
 		 */
-		if (stream_continue && stream_continue())
+		if (stream_continue && stream_continue(blockpos, timeline, false))
 		{
 			if (walfile != -1)
 				/* Potential error message is written by close_walfile */
-				return close_walfile(walfile, basedir, current_walfile_name);
+				return close_walfile(walfile, basedir, current_walfile_name, rename_partial);
 			return true;
 		}
 
@@ -486,20 +491,20 @@ ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, char *sysi
 			/* Did we reach the end of a WAL segment? */
 			if (blockpos.xrecoff % XLOG_SEG_SIZE == 0)
 			{
-				if (!close_walfile(walfile, basedir, current_walfile_name))
+				if (!close_walfile(walfile, basedir, current_walfile_name, false))
 					/* Error message written in close_walfile() */
 					return false;
 
 				walfile = -1;
 				xlogoff = 0;
 
-				if (segment_finish != NULL)
+				if (stream_continue != NULL)
 				{
 					/*
 					 * Callback when the segment finished, and return if it
 					 * told us to.
 					 */
-					if (segment_finish(blockpos, timeline))
+					if (stream_continue(blockpos, timeline, true))
 						return true;
 				}
 			}
diff --git a/src/bin/pg_basebackup/receivelog.h b/src/bin/pg_basebackup/receivelog.h
index 1c61ea8..19bdf2e 100644
--- a/src/bin/pg_basebackup/receivelog.h
+++ b/src/bin/pg_basebackup/receivelog.h
@@ -1,22 +1,16 @@
 #include "access/xlogdefs.h"
 
 /*
- * Called whenever a segment is finished, return true to stop
- * the streaming at this point.
+ * Called before trying to read more data or when a segment is
+ * finished.
  */
-typedef bool (*segment_finish_callback)(XLogRecPtr segendpos, uint32 timeline);
-
-/*
- * Called before trying to read more data. Return true to stop
- * the streaming at this point.
- */
-typedef bool (*stream_continue_callback)(void);
+typedef bool (*stream_continue_callback)(XLogRecPtr segendpos, uint32 timeline, bool segment_finished);
 
 extern bool ReceiveXlogStream(PGconn *conn,
 							  XLogRecPtr startpos,
 							  uint32 timeline,
 							  char *sysidentifier,
 							  char *basedir,
-							  segment_finish_callback segment_finish,
 							  stream_continue_callback stream_continue,
-							  int standby_message_timeout);
+							  int standby_message_timeout,
+							  bool rename_partial);

Fujii Masao

masao.fujii@gmail.com

over 13 years ago

In reply to: Magnus Hagander (#5)

Re: pg_basebackup -x stream from the standby gets stuck

On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander <magnus@hagander.net> wrote:

While reviewing and cleaning this patch up a bit I noticed it actually
broke pg_receivexlog in the renaming.

Here is a new version of the patch, reworked based on the above so
we're down to a single callback. I moved the "rename last segment file
even if it's not complete" to be a parameter into ReceiveXlogStream()
instead of trying to overload a third functionality on the callback
(which is what broke pg_receivexlog).

How does this look? Have I overlooked any cases?

Thanks for the patch! Looks good to me except the followings:

pg_basebackup.c:233: warning: passing argument 6 of
'ReceiveXlogStream' from incompatible pointer type

I got the above warning on compile. To fix this, the third argument
"segment_finished"
needs to be added to reached_end_position().

It seems confusing that *stream_continue()* returns TRUE when
streaming *cannot continue*, i.e.,
its name seems to be inconsistent with what it does. What about
renaming it to stream_stop?
Similarly, it also seems confusing that *continue_streaming()* returns
TRUE when streaming
*cannot continue*.

Regards,

--
Fujii Masao

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Fujii Masao (#6)

Re: pg_basebackup -x stream from the standby gets stuck

On Thu, May 24, 2012 at 7:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander <magnus@hagander.net> wrote:

While reviewing and cleaning this patch up a bit I noticed it actually
broke pg_receivexlog in the renaming.

Here is a new version of the patch, reworked based on the above so
we're down to a single callback. I moved the "rename last segment file
even if it's not complete" to be a parameter into ReceiveXlogStream()
instead of trying to overload a third functionality on the callback
(which is what broke pg_receivexlog).

How does this look? Have I overlooked any cases?

Thanks for the patch! Looks good to me except the followings:

pg_basebackup.c:233: warning: passing argument 6 of
'ReceiveXlogStream' from incompatible pointer type

Hmm. I could've sworn I fixed that. I think I forgot to refresh the patch :-)

It seems confusing that *stream_continue()* returns TRUE when
streaming *cannot continue*, i.e.,
its name seems to be inconsistent with what it does. What about
renaming it to stream_stop?

That's a pre-existing issue, but agreed, I will rename it.

Similarly, it also seems confusing that *continue_streaming()* returns
TRUE when streaming
*cannot continue*.

Yeah, I renamed that one to stop_streaming as well.

Will apply the updated version.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/