odd output in restore mode

Started by Andrew Dunstanover 17 years ago59 messages

andrew@dunslane.net

over 17 years ago

I have just been working on setting up a continuous recovery failover
system, and noticed some odd log lines, shown below. (Using 8.3).

First note that our parsing of recovery.conf in xlog.c is pretty bad,
and at least we need to document the quirks if it's not going to be
fixed. log_restartpoints is said to be boolean, but when I set it to an
unquoted true I got a fatal error, while a quoted 'on' sets it to false,
as seen. Ick. What is more, I apparently managed to get the recovery
server to lose a WAL file and hang totally by having a bad
recovery.conf. Triple ick.

Second, what is all this about .history files? My understanding is that
they are not necessary, so surely we should try to stat them to see if
they are present before trying to copy them. These lines are going to
confuse a lot of people, I suspect (including me).

Lastly, not quite related to this output, but in the same general area,
should we have an option on pg_standby to allow removing the archive
file after it has been restored?

cheers

andrew

LOG: database system was interrupted; last known up at 2008-05-12
15:18:23 EDT
LOG: starting archive recovery
LOG: log_restartpoints = false
LOG: restore_command = '../bin/pg_standby -t ../common_archive/failover
../common_archive %f %p %r '
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
LOG: restored log file "0000000100000000000000A5.00000068.backup" from
archive
LOG: restored log file "0000000100000000000000A5" from archive
LOG: automatic recovery in progress
LOG: redo starts at 0/A50000B0
LOG: restored log file "0000000100000000000000A6" from archive
LOG: restored log file "0000000100000000000000A7" from archive
LOG: restored log file "0000000100000000000000A8" from archive
LOG: restored log file "0000000100000000000000A9" from archive
trigger file found
LOG: could not open file "pg_xlog/0000000100000000000000AA" (log file
0, segment 170): No such file or directory
LOG: redo done at 0/A9000068
LOG: restored log file "0000000100000000000000A9" from archive
cp: cannot stat `../common_archive/00000002.history': No such file or
directory
cp: cannot stat `../common_archive/00000002.history': No such file or
directory
cp: cannot stat `../common_archive/00000002.history': No such file or
directory
LOG: selected new timeline ID: 2
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
LOG: archive recovery complete
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#1)

Re: odd output in restore mode

On Mon, 2008-05-12 at 16:57 -0400, Andrew Dunstan wrote:

I have just been working on setting up a continuous recovery failover
system, and noticed some odd log lines, shown below. (Using 8.3).

Hmmm, well, the first time you use something complex, there are some
surprising features, I guess. Most especially the log lines are there to
allow production issues to be diagnosed, not to create a beautiful log.

Many of the things that look somewhat strange are there for a reason,
since a wide range of options and save-your-customers-ass scenarios are
covered by the recovery code.

Suggestions for improvement are always welcome and you are welcome to
suggest doc changes, as many people do.

First note that our parsing of recovery.conf in xlog.c is pretty bad,
and at least we need to document the quirks if it's not going to be
fixed. log_restartpoints is said to be boolean, but when I set it to an
unquoted true I got a fatal error, while a quoted 'on' sets it to false,
as seen. Ick.

Yes, some improvements are definitely due there.

What is more, I apparently managed to get the recovery
server to lose a WAL file and hang totally by having a bad
recovery.conf. Triple ick.

Sounds like a bug you should report in the normal way. Correctness is
paramount. Or are you confusing the message in the log for file AA with
an error?

Second, what is all this about .history files? My understanding is that
they are not necessary, so surely we should try to stat them to see if
they are present before trying to copy them. These lines are going to
confuse a lot of people, I suspect (including me).

I try to keep it as simple as possible, since much of this code only
gets run when you really need it to work. The request for the .history
file and the cp are examples of that.

Lastly, not quite related to this output, but in the same general area,
should we have an option on pg_standby to allow removing the archive
file after it has been restored?

There already is one, but its more complex than that. (%r)

LOG: database system was interrupted; last known up at 2008-05-12
15:18:23 EDT
LOG: starting archive recovery
LOG: log_restartpoints = false
LOG: restore_command = '../bin/pg_standby -t ../common_archive/failover
../common_archive %f %p %r '
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
LOG: restored log file "0000000100000000000000A5.00000068.backup" from
archive
LOG: restored log file "0000000100000000000000A5" from archive
LOG: automatic recovery in progress
LOG: redo starts at 0/A50000B0
LOG: restored log file "0000000100000000000000A6" from archive
LOG: restored log file "0000000100000000000000A7" from archive
LOG: restored log file "0000000100000000000000A8" from archive
LOG: restored log file "0000000100000000000000A9" from archive
trigger file found
LOG: could not open file "pg_xlog/0000000100000000000000AA" (log file
0, segment 170): No such file or directory
LOG: redo done at 0/A9000068
LOG: restored log file "0000000100000000000000A9" from archive
cp: cannot stat `../common_archive/00000002.history': No such file or
directory
cp: cannot stat `../common_archive/00000002.history': No such file or
directory
cp: cannot stat `../common_archive/00000002.history': No such file or
directory
LOG: selected new timeline ID: 2
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
cp: cannot stat `../common_archive/00000001.history': No such file or
directory
LOG: archive recovery complete
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

There is an outstanding Windows issue with pg_standby that your help
would be appreciated with, shown on latest commitfest page. It's a
Windows issue and I don't maintain a Windows dev environment.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#2)

Re: odd output in restore mode

Simon Riggs wrote:

What is more, I apparently managed to get the recovery
server to lose a WAL file and hang totally by having a bad
recovery.conf. Triple ick.

Sounds like a bug you should report in the normal way. Correctness is
paramount. Or are you confusing the message in the log for file AA with
an error?

No, it had to do with pg_standby waiting for a WAL file that had already
gone, somehow. I will try to reproduce it when I get a spare moment.

Second, what is all this about .history files? My understanding is that
they are not necessary, so surely we should try to stat them to see if
they are present before trying to copy them. These lines are going to
confuse a lot of people, I suspect (including me).

I try to keep it as simple as possible, since much of this code only
gets run when you really need it to work. The request for the .history
file and the cp are examples of that.

I don't follow. AFAICT no .history file was in fact archived. ISTM that
in this case we should only call RestoreWALFileForRecovery if the file
in fact exists.

Lastly, not quite related to this output, but in the same general area,
should we have an option on pg_standby to allow removing the archive
file after it has been restored?

There already is one, but its more complex than that. (%r)

I was using %r. But the WAL files that have been restored (according to
the log) are still in the archive dir. So it looks like %r isn't working
properly.

There is an outstanding Windows issue with pg_standby that your help
would be appreciated with, shown on latest commitfest page. It's a
Windows issue and I don't maintain a Windows dev environment.

The patch has been rejected for now, according to the Commitfest page.
Not sure what you want my help on.

BTW, none of what I reported was on Windows - it's on Linux.

cheers

andrew

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#3)

Re: odd output in restore mode

On Mon, 2008-05-12 at 18:58 -0400, Andrew Dunstan wrote:

No, it had to do with pg_standby waiting for a WAL file that had already
gone, somehow. I will try to reproduce it when I get a spare moment.

Sounds like the bug I just fixed.

There is an outstanding Windows issue with pg_standby that your help
would be appreciated with, shown on latest commitfest page. It's a
Windows issue and I don't maintain a Windows dev environment.

The patch has been rejected for now, according to the Commitfest page.
Not sure what you want my help on.

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Robert Treat

xzilla@users.sourceforge.net

over 17 years ago

In reply to: Andrew Dunstan (#3)

Re: odd output in restore mode

On Monday 12 May 2008 18:58:37 Andrew Dunstan wrote:

Simon Riggs wrote:

Lastly, not quite related to this output, but in the same general area,
should we have an option on pg_standby to allow removing the archive
file after it has been restored?

There already is one, but its more complex than that. (%r)

I was using %r. But the WAL files that have been restored (according to
the log) are still in the archive dir. So it looks like %r isn't working
properly.

Are you sure you've moved passed the latest restart point? Just because a WAL
file has been processed doesn't mean it can be deleted.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#4)

Re: odd output in restore mode

Simon Riggs wrote:

On Mon, 2008-05-12 at 18:58 -0400, Andrew Dunstan wrote:

No, it had to do with pg_standby waiting for a WAL file that had already
gone, somehow. I will try to reproduce it when I get a spare moment.

Sounds like the bug I just fixed.

Yes, so I see. I didn't have that fix, so I'll test again with the patch.

There is an outstanding Windows issue with pg_standby that your help
would be appreciated with, shown on latest commitfest page. It's a
Windows issue and I don't maintain a Windows dev environment.

The patch has been rejected for now, according to the Commitfest page.
Not sure what you want my help on.

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

cheers

andrew

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Andrew Dunstan (#6)

Re: odd output in restore mode

Andrew Dunstan <andrew@dunslane.net> writes:

Simon Riggs wrote:

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

I think the open issue is how to know whether pgwin32_safestat fixes the
problem that the kluge tried to work around.

regards, tom lane

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Robert Treat (#5)

Re: odd output in restore mode

Robert Treat wrote:

On Monday 12 May 2008 18:58:37 Andrew Dunstan wrote:

Simon Riggs wrote:

Lastly, not quite related to this output, but in the same general area,
should we have an option on pg_standby to allow removing the archive
file after it has been restored?

There already is one, but its more complex than that. (%r)

I was using %r. But the WAL files that have been restored (according to
the log) are still in the archive dir. So it looks like %r isn't working
properly.

Are you sure you've moved passed the latest restart point? Just because a WAL
file has been processed doesn't mean it can be deleted.

Thanks. It wasn't that, but when I ran with the very latest patches this
problem went away.

cheers

andrew

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Tom Lane (#7)

Re: odd output in restore mode

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Simon Riggs wrote:

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

I think the open issue is how to know whether pgwin32_safestat fixes the
problem that the kluge tried to work around.

Well, I think we need to consider quite a number of scenarios. The
archive directory could be local, on a remote Windows machine, or on a
remote Samba server. The archive file could be copied by Windows copy,
or Unix cp, or scp, or rsync, among others.

I'd like to know the setup that was found to produce the error, to start
with.

cheers

andrew

#10

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#8)

Re: odd output in restore mode

On Mon, 2008-05-12 at 22:40 -0400, Andrew Dunstan wrote:

Robert Treat wrote:

On Monday 12 May 2008 18:58:37 Andrew Dunstan wrote:

Simon Riggs wrote:

Lastly, not quite related to this output, but in the same general area,
should we have an option on pg_standby to allow removing the archive
file after it has been restored?

There already is one, but its more complex than that. (%r)

I was using %r. But the WAL files that have been restored (according to
the log) are still in the archive dir. So it looks like %r isn't working
properly.

Are you sure you've moved passed the latest restart point? Just because a WAL
file has been processed doesn't mean it can be deleted.

Thanks. It wasn't that, but when I ran with the very latest patches this
problem went away.

Thanks for testing.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#11

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#9)

Re: odd output in restore mode

On Mon, 2008-05-12 at 23:03 -0400, Andrew Dunstan wrote:

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Simon Riggs wrote:

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

I think the open issue is how to know whether pgwin32_safestat fixes the
problem that the kluge tried to work around.

Well, I think we need to consider quite a number of scenarios. The
archive directory could be local, on a remote Windows machine, or on a
remote Samba server. The archive file could be copied by Windows copy,
or Unix cp, or scp, or rsync, among others.

I'd like to know the setup that was found to produce the error, to start
with.

It's a race condition, not a deterministic bug with recreatable
conditions. My understanding is the current code was introduced to work
around the implementation of stat on Windows which says the filesize is
correct even while it is still copying it. The 1sec delay fixed that but
is clearly not a foolproof fix and introduces a delay also, which was
the original complaint.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#12

Dave Page

dpage@pgadmin.org

over 17 years ago

In reply to: Tom Lane (#7)

Re: odd output in restore mode

On Tue, May 13, 2008 at 2:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Simon Riggs wrote:

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

I think the open issue is how to know whether pgwin32_safestat fixes the
problem that the kluge tried to work around.

Per the comments on the commitfest page, I don't believe it is.
pgwin32_safestat fixes a bug in which stat() returns stale information
(if memory serves). The hack in pg_standby was added because copy in
Windows appears to preallocate the required space for the file it's
copying, thus checking the file size to verify that the copy has
completed is not a valid test.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#13

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Dave Page (#12)

Re: odd output in restore mode

On Tue, 2008-05-13 at 08:42 +0100, Dave Page wrote:

On Tue, May 13, 2008 at 2:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Simon Riggs wrote:

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

I think the open issue is how to know whether pgwin32_safestat fixes the
problem that the kluge tried to work around.

Per the comments on the commitfest page, I don't believe it is.
pgwin32_safestat fixes a bug in which stat() returns stale information
(if memory serves). The hack in pg_standby was added because copy in
Windows appears to preallocate the required space for the file it's
copying, thus checking the file size to verify that the copy has
completed is not a valid test.

Could somebody suggest and test an improvement to the Windows code, to
fix the kluge?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#14

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#13)

Re: odd output in restore mode

Simon Riggs wrote:

On Tue, 2008-05-13 at 08:42 +0100, Dave Page wrote:

On Tue, May 13, 2008 at 2:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Simon Riggs wrote:

Well, the patch was rejected long ago, not sure why its in this
commitfest. But its an open issue on the Windows port.

Surely the right fix is to use the recently implemented
pgwin32_safestat() (if we aren't already - I suspect we probably are)
and remove the kluge in pg_standby.c.

I think the open issue is how to know whether pgwin32_safestat fixes the
problem that the kluge tried to work around.

Per the comments on the commitfest page, I don't believe it is.
pgwin32_safestat fixes a bug in which stat() returns stale information
(if memory serves). The hack in pg_standby was added because copy in
Windows appears to preallocate the required space for the file it's
copying, thus checking the file size to verify that the copy has
completed is not a valid test.

Could somebody suggest and test an improvement to the Windows code, to
fix the kluge?

Given what Dave says, I'm not sure there is an easy one, at least
without a lot of testing. Greg Stark's suggestion might or might not work.

However, we should probably make the behaviour switchable. If the
archive_command populating the archive_directory were rsync, for
example, this problem should not occur, because it copies to a temp
file, and then renames it, so we should never see an incomplete file
even though rsync also apparently preallocates space.

We should also document it better in the code, along the lines of Dave's
comment above.

cheers

andrew

#15

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Andrew Dunstan (#14)

Re: odd output in restore mode

I wrote:

However, we should probably make the behaviour switchable. If the
archive_command populating the archive_directory were rsync, for
example, this problem should not occur, because it copies to a temp
file, and then renames it, so we should never see an incomplete file
even though rsync also apparently preallocates space.

Another and probably simpler thing to try would be the GnuWin32 version
of cp. If we can verify that it behaves itself, we should probably
recommend it for use in archive_command instead of the native Windows copy.

I'm still not sure how to construct a test, though.

cheers

andrew

#16

Alvaro Herrera

alvherre@commandprompt.com

over 17 years ago

In reply to: Andrew Dunstan (#15)

Re: odd output in restore mode

Andrew Dunstan wrote:

Another and probably simpler thing to try would be the GnuWin32 version
of cp. If we can verify that it behaves itself, we should probably
recommend it for use in archive_command instead of the native Windows
copy.

Perhaps use xcopy, which should be more ubiquitous?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Alvaro Herrera (#16)

Re: odd output in restore mode

Alvaro Herrera wrote:

Andrew Dunstan wrote:

Another and probably simpler thing to try would be the GnuWin32 version
of cp. If we can verify that it behaves itself, we should probably
recommend it for use in archive_command instead of the native Windows
copy.

Perhaps use xcopy, which should be more ubiquitous?

I would be very surprised if xcopy did not exhibit the same
preallocating behaviour as copy.

cheers

andrew

#18

Alvaro Herrera

alvherre@commandprompt.com

over 17 years ago

In reply to: Andrew Dunstan (#17)

Re: odd output in restore mode

Andrew Dunstan wrote:

I would be very surprised if xcopy did not exhibit the same
preallocating behaviour as copy.

I, on the other hand, would not say anything until someone tried it, and
then wouldn't be surprised if it behaved either way :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19

Dave Page

dpage@pgadmin.org

over 17 years ago

In reply to: Alvaro Herrera (#18)

Re: odd output in restore mode

On Tue, May 13, 2008 at 5:11 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Andrew Dunstan wrote:

I would be very surprised if xcopy did not exhibit the same
preallocating behaviour as copy.

I, on the other hand, would not say anything until someone tried it, and
then wouldn't be surprised if it behaved either way :-)

It pre-allocates the space as copy does. And yes, I did test :-p

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#20

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Dave Page (#19)

Re: odd output in restore mode

Dave Page wrote:

On Tue, May 13, 2008 at 5:11 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Andrew Dunstan wrote:

I would be very surprised if xcopy did not exhibit the same
preallocating behaviour as copy.

I, on the other hand, would not say anything until someone tried it, and
then wouldn't be surprised if it behaved either way :-)

It pre-allocates the space as copy does. And yes, I did test :-p

Dave,

I don't know how you tested, but could you please repeat the test with
GnuWin32's cp.exe? If it doesn't preallocate the space then I think our
way forward is reasonably clear:

. we recommend its use for Windows archive_command settings
. we provide the delay kluge as switchable behaviour on Windows instead
of having it always on.

cheers

andrew

#21

Dave Page

dpage@pgadmin.org

over 17 years ago

In reply to: Andrew Dunstan (#20)

Re: odd output in restore mode

On Sun, May 18, 2008 at 1:38 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

I don't know how you tested,

Copy a large file across a relatively slow network, and check the size
on the destination drive before it finishes.

but could you please repeat the test with
GnuWin32's cp.exe? If it doesn't preallocate the space then I think our way
forward is reasonably clear:

It does not pre-allocate.

. we recommend its use for Windows archive_command settings
. we provide the delay kluge as switchable behaviour on Windows instead of
having it always on.

Sounds reasonable to me.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#22

Bruce Momjian

bruce@momjian.us

over 17 years ago

In reply to: Andrew Dunstan (#1)

1 attachment(s)

Re: odd output in restore mode

Andrew Dunstan wrote:

I have just been working on setting up a continuous recovery failover
system, and noticed some odd log lines, shown below. (Using 8.3).

First note that our parsing of recovery.conf in xlog.c is pretty bad,
and at least we need to document the quirks if it's not going to be
fixed. log_restartpoints is said to be boolean, but when I set it to an
unquoted true I got a fatal error, while a quoted 'on' sets it to false,
as seen. Ick. What is more, I apparently managed to get the recovery

I have fixed the boolean problem with the attached, applied patch. It
exposes guc.c::parse_bool() for use in xlog.c.

I assume all the other problems you reported have been corrected.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload

Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.314
diff -c -c -r1.314 xlog.c
*** src/backend/access/transam/xlog.c	12 Jun 2008 09:12:30 -0000	1.314
--- src/backend/access/transam/xlog.c	30 Jun 2008 22:10:07 -0000
***************
*** 4523,4535 ****
  			/*
  			 * does nothing if a recovery_target is not also set
  			 */
! 			if (strcmp(tok2, "true") == 0)
! 				recoveryTargetInclusive = true;
! 			else
! 			{
! 				recoveryTargetInclusive = false;
! 				tok2 = "false";
! 			}
  			ereport(LOG,
  					(errmsg("recovery_target_inclusive = %s", tok2)));
  		}
--- 4523,4532 ----
  			/*
  			 * does nothing if a recovery_target is not also set
  			 */
! 			if (!parse_bool(tok2, &recoveryTargetInclusive))
! 				  ereport(ERROR,
! 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 					  errmsg("parameter \"recovery_target_inclusive\" requires a Boolean value")));
  			ereport(LOG,
  					(errmsg("recovery_target_inclusive = %s", tok2)));
  		}
***************
*** 4538,4550 ****
  			/*
  			 * does nothing if a recovery_target is not also set
  			 */
! 			if (strcmp(tok2, "true") == 0)
! 				recoveryLogRestartpoints = true;
! 			else
! 			{
! 				recoveryLogRestartpoints = false;
! 				tok2 = "false";
! 			}
  			ereport(LOG,
  					(errmsg("log_restartpoints = %s", tok2)));
  		}
--- 4535,4544 ----
  			/*
  			 * does nothing if a recovery_target is not also set
  			 */
! 			if (!parse_bool(tok2, &recoveryLogRestartpoints))
! 				  ereport(ERROR,
! 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 					  errmsg("parameter \"log_restartpoints\" requires a Boolean value")));
  			ereport(LOG,
  					(errmsg("log_restartpoints = %s", tok2)));
  		}
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.457
diff -c -c -r1.457 guc.c
*** src/backend/utils/misc/guc.c	30 Jun 2008 10:58:47 -0000	1.457
--- src/backend/utils/misc/guc.c	30 Jun 2008 22:10:07 -0000
***************
*** 3991,3997 ****
   * If the string parses okay, return true, else false.
   * If okay and result is not NULL, return the value in *result.
   */
! static bool
  parse_bool(const char *value, bool *result)
  {
  	size_t		len = strlen(value);
--- 3991,3997 ----
   * If the string parses okay, return true, else false.
   * If okay and result is not NULL, return the value in *result.
   */
! bool
  parse_bool(const char *value, bool *result)
  {
  	size_t		len = strlen(value);
Index: src/include/utils/guc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/guc.h,v
retrieving revision 1.96
diff -c -c -r1.96 guc.h
*** src/include/utils/guc.h	28 May 2008 09:04:06 -0000	1.96
--- src/include/utils/guc.h	30 Jun 2008 22:10:07 -0000
***************
*** 223,228 ****
--- 223,229 ----
  extern void AtEOXact_GUC(bool isCommit, int nestLevel);
  extern void BeginReportingGUCOptions(void);
  extern void ParseLongOption(const char *string, char **name, char **value);
+ extern bool parse_bool(const char *value, bool *result);
  extern bool set_config_option(const char *name, const char *value,
  				  GucContext context, GucSource source,
  				  GucAction action, bool changeVal);

#23

Bruce Momjian

bruce@momjian.us

over 17 years ago

In reply to: Dave Page (#21)

Re: odd output in restore mode

Dave Page wrote:

On Sun, May 18, 2008 at 1:38 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

I don't know how you tested,

Copy a large file across a relatively slow network, and check the size
on the destination drive before it finishes.

but could you please repeat the test with
GnuWin32's cp.exe? If it doesn't preallocate the space then I think our way
forward is reasonably clear:

It does not pre-allocate.

. we recommend its use for Windows archive_command settings
. we provide the delay kluge as switchable behaviour on Windows instead of
having it always on.

Sounds reasonable to me.

Are there any changes we need to make here?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#24

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Bruce Momjian (#23)

Re: odd output in restore mode

Bruce Momjian wrote:

Dave Page wrote:

On Sun, May 18, 2008 at 1:38 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

I don't know how you tested,

Copy a large file across a relatively slow network, and check the size
on the destination drive before it finishes.

but could you please repeat the test with
GnuWin32's cp.exe? If it doesn't preallocate the space then I think our way
forward is reasonably clear:

It does not pre-allocate.

. we recommend its use for Windows archive_command settings
. we provide the delay kluge as switchable behaviour on Windows instead of
having it always on.

Sounds reasonable to me.

Are there any changes we need to make here?

Yes. Simon has promised a patch to do the above.

cheers

andrew

#25

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Andrew Dunstan (#24)

1 attachment(s)

Re: [HACKERS] odd output in restore mode

On Mon, 2008-06-30 at 19:29 -0400, Andrew Dunstan wrote:

Bruce Momjian wrote:

Dave Page wrote:

On Sun, May 18, 2008 at 1:38 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

I don't know how you tested,

Copy a large file across a relatively slow network, and check the size
on the destination drive before it finishes.

but could you please repeat the test with
GnuWin32's cp.exe? If it doesn't preallocate the space then I think our way
forward is reasonably clear:

It does not pre-allocate.

. we recommend its use for Windows archive_command settings
. we provide the delay kluge as switchable behaviour on Windows instead of
having it always on.

Sounds reasonable to me.

Are there any changes we need to make here?

Yes. Simon has promised a patch to do the above.

Patch implements

* recommendation to use GnuWin32 cp on Windows
* provide "holdtime" delay, default 0 (on all platforms)
* default stays same on Windows="copy" to ensure people upgrading don't
get stung

Patch should be backpatched to 8.3, plus to CVS HEAD.

We should recommend in next 8.3 release notes that people use "-p" or
"-l" rather than just letting it default.

Will add permalink to Wiki when patch appears in archives.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

pg_standby_win32.v1.patchtext/x-patch; charset=UTF-8; name=pg_standby_win32.v1.patchDownload

Index: contrib/pg_standby/pg_standby.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/contrib/pg_standby/pg_standby.c,v
retrieving revision 1.12
diff -c -r1.12 pg_standby.c
*** contrib/pg_standby/pg_standby.c	17 May 2008 01:28:21 -0000	1.12
--- contrib/pg_standby/pg_standby.c	1 Jul 2008 08:18:50 -0000
***************
*** 44,49 ****
--- 44,50 ----
  
  /* Options and defaults */
  int			sleeptime = 5;		/* amount of time to sleep between file checks */
+ int			holdtime = 0;		/* amount of time to wait once file appears full */
  int			waittime = -1;		/* how long we have been waiting, -1 no wait
  								 * yet */
  int			maxwaittime = 0;	/* how long are we prepared to wait for? */
***************
*** 67,75 ****
  char		exclusiveCleanupFileName[MAXPGPATH];		/* the file we need to
  														 * get from archive */
  
! #define RESTORE_COMMAND_COPY 0
! #define RESTORE_COMMAND_LINK 1
! int			restoreCommandType;
  
  #define XLOG_DATA			 0
  #define XLOG_HISTORY		 1
--- 68,83 ----
  char		exclusiveCleanupFileName[MAXPGPATH];		/* the file we need to
  														 * get from archive */
  
! #define RESTORE_COMMAND_CP 				0
! #define RESTORE_COMMAND_LN 				1
! #define RESTORE_COMMAND_WIN32_MKLINK	2
! #define RESTORE_COMMAND_WIN32_COPY		3
! #ifdef WIN32
! int			restoreCommandType = RESTORE_COMMAND_WIN32_COPY;
! #else
! int			restoreCommandType = RESTORE_COMMAND_CP;
! #endif
! 
  
  #define XLOG_DATA			 0
  #define XLOG_HISTORY		 1
***************
*** 113,142 ****
  {
  #ifdef WIN32
  	snprintf(WALFilePath, MAXPGPATH, "%s\\%s", archiveLocation, nextWALFileName);
  	switch (restoreCommandType)
  	{
! 		case RESTORE_COMMAND_LINK:
  			SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
! 		case RESTORE_COMMAND_COPY:
! 		default:
  			SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
  			break;
! 	}
! #else
! 	snprintf(WALFilePath, MAXPGPATH, "%s/%s", archiveLocation, nextWALFileName);
! 	switch (restoreCommandType)
! 	{
! 		case RESTORE_COMMAND_LINK:
! #if HAVE_WORKING_LINK
  			SET_RESTORE_COMMAND("ln -s -f", WALFilePath, xlogFilePath);
  			break;
! #endif
! 		case RESTORE_COMMAND_COPY:
! 		default:
  			SET_RESTORE_COMMAND("cp", WALFilePath, xlogFilePath);
  			break;
  	}
- #endif
  
  	/*
  	 * This code assumes that archiveLocation is a directory You may wish to
--- 121,148 ----
  {
  #ifdef WIN32
  	snprintf(WALFilePath, MAXPGPATH, "%s\\%s", archiveLocation, nextWALFileName);
+ #else
+ 	snprintf(WALFilePath, MAXPGPATH, "%s/%s", archiveLocation, nextWALFileName);
+ #endif
  	switch (restoreCommandType)
  	{
! 		case RESTORE_COMMAND_WIN32_MKLINK:
  			SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
! 		case RESTORE_COMMAND_WIN32_COPY:
  			SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
  			break;
! 		case RESTORE_COMMAND_LN:
  			SET_RESTORE_COMMAND("ln -s -f", WALFilePath, xlogFilePath);
  			break;
! 		case RESTORE_COMMAND_CP:
  			SET_RESTORE_COMMAND("cp", WALFilePath, xlogFilePath);
  			break;
+ 		default:
+ 			fprintf(stderr, "pg_standby: unknown restore command\n");
+ 			fflush(stderr);
+ 			exit(2);
+ 			break;
  	}
  
  	/*
  	 * This code assumes that archiveLocation is a directory You may wish to
***************
*** 175,191 ****
  		}
  		else if (stat_buf.st_size == XLOG_SEG_SIZE)
  		{
- #ifdef WIN32
- 
  			/*
! 			 * Windows reports that the file has the right number of bytes
  			 * even though the file is still being copied and cannot be opened
! 			 * by pg_standby yet. So we wait for sleeptime secs before
  			 * attempting to restore. If that is not enough, we will rely on
  			 * the retry/holdoff mechanism.
  			 */
! 			pg_usleep(sleeptime * 1000000L);
! #endif
  			nextWALFileType = XLOG_DATA;
  			return true;
  		}
--- 181,196 ----
  		}
  		else if (stat_buf.st_size == XLOG_SEG_SIZE)
  		{
  			/*
! 			 * Some utilities report that the file has the right number of bytes
  			 * even though the file is still being copied and cannot be opened
! 			 * by pg_standby yet. So we wait for holdtime secs before
  			 * attempting to restore. If that is not enough, we will rely on
  			 * the retry/holdoff mechanism.
  			 */
! 			if (holdtime > 0)
! 				pg_usleep(holdtime * 1000000L);
! 
  			nextWALFileType = XLOG_DATA;
  			return true;
  		}
***************
*** 441,448 ****
--- 446,455 ----
  	fprintf(stderr, "  -d			generate lots of debugging output (testing only)\n");
  	fprintf(stderr, "  -k NUMFILESTOKEEP	if RESTARTWALFILE not used, removes files prior to limit (0 keeps all)\n");
  	fprintf(stderr, "  -l			links into archive (leaves file in archive)\n");
+ 	fprintf(stderr, "  -p			always uses GNU compatible 'cp' command on all platforms\n");
  	fprintf(stderr, "  -r MAXRETRIES		max number of times to retry, with progressive wait (default=3)\n");
  	fprintf(stderr, "  -s SLEEPTIME		seconds to wait between file checks (min=1, max=60, default=5)\n");
+ 	fprintf(stderr, "  -h HOLDTIME		seconds to wait once file appears full (min=0, max=60, default=0)\n");
  	fprintf(stderr, "  -t TRIGGERFILE	defines a trigger file to initiate failover (no default)\n");
  	fprintf(stderr, "  -w MAXWAITTIME	max seconds to wait for a file (0=no limit)(default=0)\n");
  	fflush(stderr);
***************
*** 463,474 ****
  	(void) signal(SIGINT, sighandler);
  	(void) signal(SIGQUIT, sighandler);
  
! 	while ((c = getopt(argc, argv, "cdk:lr:s:t:w:")) != -1)
  	{
  		switch (c)
  		{
  			case 'c':			/* Use copy */
! 				restoreCommandType = RESTORE_COMMAND_COPY;
  				break;
  			case 'd':			/* Debug mode */
  				debug = true;
--- 470,488 ----
  	(void) signal(SIGINT, sighandler);
  	(void) signal(SIGQUIT, sighandler);
  
! 	while ((c = getopt(argc, argv, "cdh:k:lpr:s:t:w:")) != -1)
  	{
  		switch (c)
  		{
  			case 'c':			/* Use copy */
! #ifdef WIN32
! 				restoreCommandType = RESTORE_COMMAND_WIN32_COPY;
! #else
! 				restoreCommandType = RESTORE_COMMAND_CP;
! #endif
! 				break;
! 			case 'p':			/* Use cp */
! 				restoreCommandType = RESTORE_COMMAND_CP;
  				break;
  			case 'd':			/* Debug mode */
  				debug = true;
***************
*** 483,489 ****
  				}
  				break;
  			case 'l':			/* Use link */
! 				restoreCommandType = RESTORE_COMMAND_LINK;
  				break;
  			case 'r':			/* Retries */
  				maxretries = atoi(optarg);
--- 497,511 ----
  				}
  				break;
  			case 'l':			/* Use link */
! #ifdef WIN32
! 				restoreCommandType = RESTORE_COMMAND_WIN32_MKLINK;
! #else
! #if HAVE_WORKING_LINK
! 				restoreCommandType = RESTORE_COMMAND_LN;
! #else
! 				restoreCommandType = RESTORE_COMMAND_CP;
! #endif
! #endif
  				break;
  			case 'r':			/* Retries */
  				maxretries = atoi(optarg);
***************
*** 503,508 ****
--- 525,539 ----
  					exit(2);
  				}
  				break;
+ 			case 'h':			/* Hold time */
+ 				holdtime = atoi(optarg);
+ 				if (holdtime < 0 || holdtime > 60)
+ 				{
+ 					fprintf(stderr, "usage: pg_standby -h holdtime incorrectly set\n");
+ 					usage();
+ 					exit(2);
+ 				}
+ 				break;
  			case 't':			/* Trigger file */
  				triggerPath = optarg;
  				if (CheckForExternalTrigger())
Index: doc/src/sgml/standby.sgml
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/standby.sgml,v
retrieving revision 1.1
diff -c -r1.1 standby.sgml
*** doc/src/sgml/standby.sgml	10 Nov 2007 23:30:46 -0000	1.1
--- doc/src/sgml/standby.sgml	1 Jul 2008 08:29:16 -0000
***************
*** 136,141 ****
--- 136,149 ----
        </entry>
       </row>
       <row>
+       <entry>-p</entry>
+       <entry>
+        <para>
+         always use cp command on any platform
+        </para>
+       </entry>
+      </row>
+      <row>
        <entry>-r maxretries</entry>
        <entry>
         <para>
***************
*** 160,165 ****
--- 168,182 ----
        </entry>
       </row>
       <row>
+       <entry>-h holdtime</entry>
+       <entry>
+        the number of seconds to wait once a file appears full. 
+ 	   This is required with some utilities, such as Win32 Copy, which
+ 	   sets the filesize before it has completed the copy process.
+        <literal>Default=0</literal>
+       </entry>
+      </row>
+      <row>
        <entry>-t triggerfile</entry>
        <entry>
         the presence of the triggerfile will cause recovery to end
***************
*** 228,234 ****
       *not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
      </para>
      <programlisting>
! restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442
   ..\archive %f %p 2>> standby.log'
      </programlisting>
      <para>
--- 245,251 ----
       *not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
      </para>
      <programlisting>
! restore_command = 'pg_standby -c -d -s 5 -h 5 -w 0 -t C:\pgsql.trigger.5442
   ..\archive %f %p 2>> standby.log'
      </programlisting>
      <para>
***************
*** 238,248 ****
--- 255,272 ----
       <listitem><para>use a copy command to restore WAL files from archive</para></listitem>
       <listitem><para>produce logfile output in standby.log</para></listitem>
       <listitem><para>sleep for 5 seconds between checks for next WAL file is full</para></listitem>
+      <listitem><para>sleep for 5 second after the file appears full</para></listitem>
       <listitem><para>never timeout if file not found</para></listitem>
       <listitem><para>stop waiting when a trigger file called C:\pgsql.trigger.5442 appears</para></listitem>
      </itemizedlist>
     </listitem>
    </itemizedlist>
+     <para>
+      On Windows you are advised to use GNUWin32's cp utility. The cp utility
+ 	 does not set filesize until file copying is complete, so there is no
+ 	 need to use the -h option to introduce additional delays to ensure 
+ 	 correctness. See <ulink url="http://gnuwin32.sourceforge.net/"></ulink>
+     </para>
   </sect2>
   
  </sect1>

#26

Heikki Linnakangas

heikki@enterprisedb.com

over 17 years ago

In reply to: Simon Riggs (#25)

Re: [HACKERS] odd output in restore mode

Simon Riggs wrote:

Patch implements

* recommendation to use GnuWin32 cp on Windows
* provide "holdtime" delay, default 0 (on all platforms)
* default stays same on Windows="copy" to ensure people upgrading don't
get stung

This seems pretty kludgey to me. I wouldn't want to install GnuWin32
utilities on a production system just for the "cp" command, and I don't
know how I would tune holdtime properly for using "copy". And it seems
risky to have defaults that are known to not work reliably.

How about implementing a replacement function for "cp" ourselves? It
seems pretty trivial to do. We could use that on Unixes as well, which
would keep the differences between Win32 and other platforms smaller,
and thus ensure the codepath gets more testing.

(Sorry for jumping into the discussion so late, I didn't follow this
thread earlier, and just read it now in the archives while looking at
the patch.)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#27

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Heikki Linnakangas (#26)

Re: [HACKERS] odd output in restore mode

On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

Patch implements

* recommendation to use GnuWin32 cp on Windows
* provide "holdtime" delay, default 0 (on all platforms)
* default stays same on Windows="copy" to ensure people upgrading don't
get stung

This seems pretty kludgey to me. I wouldn't want to install GnuWin32
utilities on a production system just for the "cp" command, and I don't
know how I would tune holdtime properly for using "copy". And it seems
risky to have defaults that are known to not work reliably.

How about implementing a replacement function for "cp" ourselves? It
seems pretty trivial to do. We could use that on Unixes as well, which
would keep the differences between Win32 and other platforms smaller,
and thus ensure the codepath gets more testing.

(Sorry for jumping into the discussion so late, I didn't follow this
thread earlier, and just read it now in the archives while looking at
the patch.)

If you've heard complaints about any of this from users, I haven't.
AFAIK we're doing this because it *might* cause a problem. Bear in mind
that link is the preferred performance option, not copy. So AFAICS we're
tuning a secondary option on one specific port, without it being a
raised issue and in an area of code that will be superceded in the next
release.

So further embellishments would be a long way down my own priority list,
putting it politely. Yet I have no objections to the suggestion overall;
we have done that already for alter tablespace.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#28

Bruce Momjian

bruce@momjian.us

over 17 years ago

In reply to: Simon Riggs (#27)

Re: [HACKERS] odd output in restore mode

Simon Riggs wrote:

* recommendation to use GnuWin32 cp on Windows
* provide "holdtime" delay, default 0 (on all platforms)
* default stays same on Windows="copy" to ensure people upgrading don't
get stung

This seems pretty kludgey to me. I wouldn't want to install GnuWin32
utilities on a production system just for the "cp" command, and I don't
know how I would tune holdtime properly for using "copy". And it seems
risky to have defaults that are known to not work reliably.

How about implementing a replacement function for "cp" ourselves? It
seems pretty trivial to do. We could use that on Unixes as well, which
would keep the differences between Win32 and other platforms smaller,
and thus ensure the codepath gets more testing.

(Sorry for jumping into the discussion so late, I didn't follow this
thread earlier, and just read it now in the archives while looking at
the patch.)

If you've heard complaints about any of this from users, I haven't.
AFAIK we're doing this because it *might* cause a problem. Bear in mind
that link is the preferred performance option, not copy. So AFAICS we're
tuning a secondary option on one specific port, without it being a
raised issue and in an area of code that will be superceded in the next
release.

So further embellishments would be a long way down my own priority list,
putting it politely. Yet I have no objections to the suggestion overall;
we have done that already for alter tablespace.

OK, based on these observations I think we need to learn more about the
issues before making any changes to our code.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#29

Martin Zaun

Martin.Zaun@Sun.COM

over 17 years ago

In reply to: Bruce Momjian (#28)

Re: [PATCHES] odd output in restore mode

Below my comments on the CommitFest patch:
pg_standby minor changes for Windows

Simon, I'm sorry you got me, a Postgres newbie, signed up for
reviewing your patch ;)

To start with, I'm not quite sure of the status of this patch
since Bruce's last comment on the -patches alias:

Bruce Momjian wrote:

OK, based on these observations I think we need to learn more about the
issues before making any changes to our code.

From easy to difficult:

1. Issues with applying the patch to CVS HEAD:

The second file in the patch
   Index: doc/src/sgml/standby.sgml
appears to be misnamed -- the existing file in HEAD is
   Index: doc/src/sgml/pgstandby.sgml

However, still had issues after fixing the file name:

md@Garu:~/pg/pgsql$ patch -c -p0 < ../pg_standby.patch
patching file contrib/pg_standby/pg_standby.c
patching file doc/src/sgml/pgstandby.sgml
Hunk #1 FAILED at 136.
Hunk #2 FAILED at 168.
Hunk #3 FAILED at 245.
Hunk #4 FAILED at 255.
4 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/pgstandby.sgml.rej

2. Missing description for new command-line options in pgstandby.sgml

Simon Riggs wrote:

Patch implements
* recommendation to use GnuWin32 cp on Windows

Saw that in the changes to pgstandby.sgml, and looks ok to me, but:
- no description of the proposed new command-line options -h and -p?

3. No coding style issues seen

Just one comment: the logic that selects the actual restore command to
be used has moved from CustomizableInitialize() to main() -- a matter
of personal taste, perhaps. But in my view the:
+ the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read

4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before and is not addresses by the
patch: The user-selected Win32 "mklink" command mode is never applied
due to a missing 'break' in CustomizableInitialize():

switch (restoreCommandType)
{
case RESTORE_COMMAND_WIN32_MKLINK:
SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
case RESTORE_COMMAND_WIN32_COPY:
SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
break;

A similar behaviour on Non-Win32 platforms where the user-selected
"ln" may be silently changed to "cp" in main():

#if HAVE_WORKING_LINK
restoreCommandType = RESTORE_COMMAND_LN;
#else
restoreCommandType = RESTORE_COMMAND_CP;
#endif

If both Win32/Non-Win32 cases reflect the intended behaviour:
- I'd prefer a code comment in the above case-fall-through,
- suggest a message to the user about the ignored "ln" / "mklink",
- observe that the logic to override of the '-l' option is now in two
places: CustomizableInitialize() and main().

5. Minor wording issue in usage message on new '-p' option

I was wondering if the "always" in the usage text
fprintf(stderr, " -p always uses GNU compatible 'cp' command on all platforms\n");
is too strong, since multiple restore command options overwrite each
other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp".

6. Minor code comment suggestion

Unrelated to this patch, I wonder if the code comments on all four
time-related vars better read "seconds" instead of "amount of time":
int sleeptime = 5; /* amount of time to sleep between file checks */
int holdtime = 0; /* amount of time to wait once file appears full */
int waittime = -1; /* how long we have been waiting, -1 no wait
* yet */
int maxwaittime = 0; /* how long are we prepared to wait for? */

7. Question: benefits of separate holdtime option from sleeptime?

Simon Riggs wrote:

* provide "holdtime" delay, default 0 (on all platforms)

Going back on the hackers+patches emails and parsing the code
comments, I'm sorry if I missed that, but I'm not sure I've understood
the exact tuning benefits that introducing the new holdtime option
provides over using the existing sleeptime, as it's been the case
(just on Win32 only).

8. Unresolved question of implementing now/later a "cp" replacement

Simon Riggs wrote:

On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote:

This seems pretty kludgey to me. I wouldn't want to install GnuWin32
utilities on a production system just for the "cp" command, and I don't
know how I would tune holdtime properly for using "copy". And it seems
risky to have defaults that are known to not work reliably.

How about implementing a replacement function for "cp" ourselves? It
seems pretty trivial to do. We could use that on Unixes as well, which
would keep the differences between Win32 and other platforms smaller,
and thus ensure the codepath gets more testing.

If you've heard complaints about any of this from users, I haven't.
AFAIK we're doing this because it *might* cause a problem. Bear in mind
that link is the preferred performance option, not copy. So AFAICS we're
tuning a secondary option on one specific port, without it being a
raised issue and in an area of code that will be superceded in the next
release.

So further embellishments would be a long way down my own priority list,
putting it politely. Yet I have no objections to the suggestion overall;
we have done that already for alter tablespace.

Don't have much to add to the whether/now/later question of providing
a "cp" replacement, but I guess the existing command-line options and
documentation wouldn't have to change with our own "cp" replacement
while the newly proposed '-h' and '-p' would become moot then, right?

Regards,
Martin

#30

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Martin Zaun (#29)

Re: [PATCHES] odd output in restore mode

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

1. Issues with applying the patch to CVS HEAD:

Sounds awful. Thanks for the review, will fix.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#31

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Martin Zaun (#29)

Re: [PATCHES] odd output in restore mode

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

1. Issues with applying the patch to CVS HEAD:

For me, the patch applies cleanly to CVS HEAD.

I do notice that there are two files "standby.sgml" and
"pgstandby.sgml". I can't see where "standby.sgml" comes from, but I
haven't created it; perhaps it is a relic of the SGML build process.
I've recreated my source tree since I wrote the patch also. Weird.

I'll redo the patch so it points at pgstandby.sgml, which is the one
thats listed as being in the main source tree.

2. Missing description for new command-line options in pgstandby.sgml

- no description of the proposed new command-line options -h and -p?

These are done. The patch issues have missed those hunks.

3. No coding style issues seen

Just one comment: the logic that selects the actual restore command to
be used has moved from CustomizableInitialize() to main() -- a matter
of personal taste, perhaps. But in my view the:
+ the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read

Thanks

4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before

Well spotted. I don't claim to test this for Windows.

5. Minor wording issue in usage message on new '-p' option

I was wondering if the "always" in the usage text
fprintf(stderr, " -p always uses GNU compatible 'cp' command on all platforms\n");
is too strong, since multiple restore command options overwrite each
other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp".

I was assuming you don't turn the switch off again immediately
afterwards.

6. Minor code comment suggestion

Unrelated to this patch, I wonder if the code comments on all four
time-related vars better read "seconds" instead of "amount of time":
int sleeptime = 5; /* amount of time to sleep between file checks */
int holdtime = 0; /* amount of time to wait once file appears full */
int waittime = -1; /* how long we have been waiting, -1 no wait
* yet */
int maxwaittime = 0; /* how long are we prepared to wait for? */

As you say, unrelated to the patch.

7. Question: benefits of separate holdtime option from sleeptime?

Simon Riggs wrote:

* provide "holdtime" delay, default 0 (on all platforms)

Going back on the hackers+patches emails and parsing the code
comments, I'm sorry if I missed that, but I'm not sure I've understood
the exact tuning benefits that introducing the new holdtime option
provides over using the existing sleeptime, as it's been the case
(just on Win32 only).

This is central to the patch, since the complaint was about the delay
introduced by doing that previously.

8. Unresolved question of implementing now/later a "cp" replacement

The patch implements what's been agreed.

I'm not rewriting "cp", for reasons already discussed.

Not a comment to you Martin, but it's fairly clear that I'm not
maintaining this correctly for Windows. I've never claimed to have
tested this on Windows, and only included Windows related items as
requested by others. I need to make it clear that I'm not going to
maintain it at all, for Windows. If others wish to report Windows issues
then they can suggest appropriate fixes and test them also.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#32

Heikki Linnakangas

heikki@enterprisedb.com

over 17 years ago

In reply to: Simon Riggs (#31)

Re: [PATCHES] odd output in restore mode

Simon Riggs wrote:

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

8. Unresolved question of implementing now/later a "cp" replacement

The patch implements what's been agreed.

I'm not rewriting "cp", for reasons already discussed.

Not a comment to you Martin, but it's fairly clear that I'm not
maintaining this correctly for Windows. I've never claimed to have
tested this on Windows, and only included Windows related items as
requested by others. I need to make it clear that I'm not going to
maintain it at all, for Windows. If others wish to report Windows issues
then they can suggest appropriate fixes and test them also.

Hmm. I just realized that replacing the "cp" command within pg_standby
won't help at all. The problem is with the command that copies the files
*to* the archivelocation that pg_standby polls, not with the copy
pg_standby does from archivelocation to pg_xlog. And we don't have much
control over that.

We really need a more reliable way of detecting that a file has been
fully copied. One simple improvement would be to check the xlp_magic
field of the last page, though it still wouldn't be bullet-proof.

Do the commands that preallocate the space keep the file exclusively
locked during the copy? If they do, shouldn't we get an error in trying
to run the restore copy command, and retry after the 1s sleep in
RestoreWALFileForRecovery? Though if the archive location is a samba
mount or something, I guess we can't rely on Windows-style exclusive
locking.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#33

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 17 years ago

In reply to: Heikki Linnakangas (#32)

Re: [PATCHES] odd output in restore mode

"Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

We really need a more reliable way of detecting that a file has been

fully copied.

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed. I think
that this even works on Windows. Could that just be documented as a
strong recommendation for the archive script?

-Kevin

#34

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Kevin Grittner (#33)

Re: [PATCHES] odd output in restore mode

Kevin Grittner wrote:

"Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

We really need a more reliable way of detecting that a file has been

fully copied.

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed. I think
that this even works on Windows. Could that just be documented as a
strong recommendation for the archive script?

Needs testing at least. If it does in fact work then we can just adjust
the docs and be done - or maybe provide a .bat file or perl script that
would work as na archive_command on Windows.

cheers

andrew

#35

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Heikki Linnakangas (#32)

Re: [PATCHES] odd output in restore mode

On Wed, 2008-07-23 at 21:38 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

8. Unresolved question of implementing now/later a "cp" replacement

The patch implements what's been agreed.

I'm not rewriting "cp", for reasons already discussed.

Not a comment to you Martin, but it's fairly clear that I'm not
maintaining this correctly for Windows. I've never claimed to have
tested this on Windows, and only included Windows related items as
requested by others. I need to make it clear that I'm not going to
maintain it at all, for Windows. If others wish to report Windows issues
then they can suggest appropriate fixes and test them also.

Hmm. I just realized that replacing the "cp" command within pg_standby
won't help at all. The problem is with the command that copies the files
*to* the archivelocation that pg_standby polls, not with the copy
pg_standby does from archivelocation to pg_xlog. And we don't have much
control over that.

We really need a more reliable way of detecting that a file has been
fully copied. One simple improvement would be to check the xlp_magic
field of the last page, though it still wouldn't be bullet-proof.

Do the commands that preallocate the space keep the file exclusively
locked during the copy? If they do, shouldn't we get an error in trying
to run the restore copy command, and retry after the 1s sleep in
RestoreWALFileForRecovery? Though if the archive location is a samba
mount or something, I guess we can't rely on Windows-style exclusive
locking.

With respect, I need to refer you back to the my last paragraph above.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#36

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Martin Zaun (#29)

Re: [PATCHES] odd output in restore mode

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

reviewing your patch

Current status is this:

* My understanding is that Dave and Andrew (and therefore Simon) think
the approach proposed here is an acceptable one. Heikki disagrees and
wants different approach. Perhaps I misunderstand.

* Patch needs work to complete the proposed approach

* I'm willing to change the patch, but not able to test it on Windows.

Is there someone able to test the patch, if I make the changes? If not,
we should just kick this out of the CommitFest queue now and be done. If
nobody cares enough about this issue to test a fix, we shouldn't bother.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#37

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#36)

Re: [PATCHES] odd output in restore mode

Simon Riggs <simon@2ndquadrant.com> writes:

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

reviewing your patch

Current status is this:
* My understanding is that Dave and Andrew (and therefore Simon) think
the approach proposed here is an acceptable one. Heikki disagrees and
wants different approach. Perhaps I misunderstand.
* Patch needs work to complete the proposed approach
* I'm willing to change the patch, but not able to test it on Windows.

I thought the latest conclusion was that changing the behavior of
pg_standby itself wouldn't address the problem anyway, and that what we
need is just a docs patch recommending that people use safe copying
methods in their scripts that copy to the archive area?

regards, tom lane

#38

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#37)

Re: [PATCHES] odd output in restore mode

On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

reviewing your patch

Current status is this:
* My understanding is that Dave and Andrew (and therefore Simon) think
the approach proposed here is an acceptable one. Heikki disagrees and
wants different approach. Perhaps I misunderstand.
* Patch needs work to complete the proposed approach
* I'm willing to change the patch, but not able to test it on Windows.

I thought the latest conclusion was that changing the behavior of
pg_standby itself wouldn't address the problem anyway, and that what we
need is just a docs patch recommending that people use safe copying
methods in their scripts that copy to the archive area?

Plus the rest of this patch, which is really very simple.

pg_standby currently waits (on Windows) for the sleep time. We agreed
that this sleep would be on by default, but optional.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#39

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Simon Riggs (#38)

Re: [PATCHES] odd output in restore mode

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:

I thought the latest conclusion was that changing the behavior of
pg_standby itself wouldn't address the problem anyway, and that what we
need is just a docs patch recommending that people use safe copying
methods in their scripts that copy to the archive area?

Plus the rest of this patch, which is really very simple.

Why? AFAICT the patch is just a kluge that adds user-visible complexity
without providing a solution that's actually sure to work.

regards, tom lane

#40

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#39)

Re: [PATCHES] odd output in restore mode

On Fri, 2008-07-25 at 16:58 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:

I thought the latest conclusion was that changing the behavior of
pg_standby itself wouldn't address the problem anyway, and that what we
need is just a docs patch recommending that people use safe copying
methods in their scripts that copy to the archive area?

Plus the rest of this patch, which is really very simple.

Why? AFAICT the patch is just a kluge that adds user-visible complexity
without providing a solution that's actually sure to work.

First, I'm not the one objecting to the current behaviour.

Currently, there is a wait in there that can be removed if you use a
copy utility that sets size after it does a copy. So we agreed to make
it optional (at PGCon).

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#41

Heikki Linnakangas

heikki@enterprisedb.com

over 17 years ago

In reply to: Andrew Dunstan (#34)

Re: [PATCHES] odd output in restore mode

Andrew Dunstan wrote:

Kevin Grittner wrote:

"Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

We really need a more reliable way of detecting that a file has been
fully copied.

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed. I think
that this even works on Windows. Could that just be documented as a
strong recommendation for the archive script?

Needs testing at least. If it does in fact work then we can just adjust
the docs and be done

Yeah.

- or maybe provide a .bat file or perl script that
would work as na archive_command on Windows.

We're not talking about archive_command. We're talking about the thing
that copies files to the directory that pg_standby polls.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#42

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Heikki Linnakangas (#41)

Re: [PATCHES] odd output in restore mode

Heikki Linnakangas wrote:

Andrew Dunstan wrote:

- or maybe provide a .bat file or perl script that would work as na
archive_command on Windows.

We're not talking about archive_command. We're talking about the thing
that copies files to the directory that pg_standby polls.

Er, that's what the archive_command is. Look at the pg_standby docs and
you'll see that that's where we're currently recommending use of windows
copy. Perhaps you're confusing this with the restore_command?

cheers

andrew

#43

Heikki Linnakangas

heikki@enterprisedb.com

over 17 years ago

In reply to: Andrew Dunstan (#42)

Re: [PATCHES] odd output in restore mode

Andrew Dunstan wrote:

Heikki Linnakangas wrote:

Andrew Dunstan wrote:

- or maybe provide a .bat file or perl script that would work as na
archive_command on Windows.

We're not talking about archive_command. We're talking about the thing
that copies files to the directory that pg_standby polls.

Er, that's what the archive_command is. Look at the pg_standby docs and
you'll see that that's where we're currently recommending use of windows
copy. Perhaps you're confusing this with the restore_command?

Oh, right. I was thinking that archive_command copies the files to an
archive location, and there's yet another process copying files from
there to the directory pg_standby polls. But indeed in the simple
configuration, archive_command is the command that we're interested in.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#44

Greg Smith

gsmith@gregsmith.com

over 17 years ago

In reply to: Kevin Grittner (#33)

Re: [PATCHES] odd output in restore mode

On Wed, 23 Jul 2008, Kevin Grittner wrote:

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed. I think
that this even works on Windows. Could that just be documented as a
strong recommendation for the archive script?

This is exactly what I always do. I think the way cp is shown in the
examples promotes what's really a bad practice for lots of reasons, this
particular problem being just one of them.

I've been working on an improved archive_command shell script that I
expect to submit for comments and potential inclusion in the documentation
as a better base for other people to build on. This is one of the options
for how it can operate. It would be painful but not impossible to convert
a subset of that script to run under Windows as well, at least enough to
cover this particular issue.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#45

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Greg Smith (#44)

Re: [PATCHES] odd output in restore mode

Greg Smith wrote:

On Wed, 23 Jul 2008, Kevin Grittner wrote:

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed. I think
that this even works on Windows. Could that just be documented as a
strong recommendation for the archive script?

This is exactly what I always do. I think the way cp is shown in the
examples promotes what's really a bad practice for lots of reasons,
this particular problem being just one of them.

I've been working on an improved archive_command shell script that I
expect to submit for comments and potential inclusion in the
documentation as a better base for other people to build on. This is
one of the options for how it can operate. It would be painful but not
impossible to convert a subset of that script to run under Windows as
well, at least enough to cover this particular issue.

A Perl script using the (standard) File::Copy module along with the
builtin function rename() should be moderately portable. It would to be
nice not to have to maintain two scripts.

cheers

andrew

#46

Heikki Linnakangas

heikki@enterprisedb.com

over 17 years ago

In reply to: Andrew Dunstan (#45)

Re: [PATCHES] odd output in restore mode

Andrew Dunstan wrote:

Greg Smith wrote:

On Wed, 23 Jul 2008, Kevin Grittner wrote:

In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed. I think
that this even works on Windows. Could that just be documented as a
strong recommendation for the archive script?

This is exactly what I always do. I think the way cp is shown in the
examples promotes what's really a bad practice for lots of reasons,
this particular problem being just one of them.

I've been working on an improved archive_command shell script that I
expect to submit for comments and potential inclusion in the
documentation as a better base for other people to build on. This is
one of the options for how it can operate. It would be painful but not
impossible to convert a subset of that script to run under Windows as
well, at least enough to cover this particular issue.

A Perl script using the (standard) File::Copy module along with the
builtin function rename() should be moderately portable. It would to be
nice not to have to maintain two scripts.

It's also not very nice to require a Perl installation on Windows, just
for a replacement of Copy. Would a simple .bat script work?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#47

Martin Zaun

Martin.Zaun@Sun.COM

over 17 years ago

In reply to: Heikki Linnakangas (#46)

Re: [PATCHES] odd output in restore mode

Heikki Linnakangas wrote:

Andrew Dunstan wrote:

Greg Smith wrote:

On Wed, 23 Jul 2008, Kevin Grittner wrote:

I've been working on an improved archive_command shell script that I
expect to submit for comments and potential inclusion in the
documentation as a better base for other people to build on. This is
one of the options for how it can operate. It would be painful but
not impossible to convert a subset of that script to run under
Windows as well, at least enough to cover this particular issue.

A Perl script using the (standard) File::Copy module along with the
builtin function rename() should be moderately portable. It would to
be nice not to have to maintain two scripts.

It's also not very nice to require a Perl installation on Windows, just
for a replacement of Copy. Would a simple .bat script work?

With these avenues to be explored, can the pg_standby patch on the
CommitFest wiki be moved to the "Returned with Feedback" section?

Regards,
Martin

#48

Heikki Linnakangas

heikki@enterprisedb.com

over 17 years ago

In reply to: Martin Zaun (#47)

Re: [PATCHES] odd output in restore mode

Martin Zaun wrote:

Heikki Linnakangas wrote:

Andrew Dunstan wrote:

Greg Smith wrote:

On Wed, 23 Jul 2008, Kevin Grittner wrote:

I've been working on an improved archive_command shell script that I
expect to submit for comments and potential inclusion in the
documentation as a better base for other people to build on. This is
one of the options for how it can operate. It would be painful but
not impossible to convert a subset of that script to run under
Windows as well, at least enough to cover this particular issue.

A Perl script using the (standard) File::Copy module along with the
builtin function rename() should be moderately portable. It would to
be nice not to have to maintain two scripts.

It's also not very nice to require a Perl installation on Windows,
just for a replacement of Copy. Would a simple .bat script work?

With these avenues to be explored, can the pg_standby patch on the
CommitFest wiki be moved to the "Returned with Feedback" section?

Yes, I think we can conclude that we don't want this patch as it is.
Instead, we want a documentation patch that describes the problem,
mentioning that GNU cp is safe, or you can use the copy+rename trick.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#49

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Heikki Linnakangas (#48)

Re: [PATCHES] odd output in restore mode

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Martin Zaun wrote:

With these avenues to be explored, can the pg_standby patch on the
CommitFest wiki be moved to the "Returned with Feedback" section?

Yes, I think we can conclude that we don't want this patch as it is.
Instead, we want a documentation patch that describes the problem,
mentioning that GNU cp is safe, or you can use the copy+rename trick.

Right, after which we remove the presently hacked-in delay.

I've updated the commitfest page accordingly.

regards, tom lane

#50

Simon Riggs

simon@2ndquadrant.com

over 17 years ago

In reply to: Tom Lane (#49)

Re: [HACKERS]odd output in restore mode

On Thu, 2008-07-31 at 12:32 -0400, Tom Lane wrote:

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Martin Zaun wrote:

With these avenues to be explored, can the pg_standby patch on the
CommitFest wiki be moved to the "Returned with Feedback" section?

Yes, I think we can conclude that we don't want this patch as it is.
Instead, we want a documentation patch that describes the problem,
mentioning that GNU cp is safe, or you can use the copy+rename trick.

Right, after which we remove the presently hacked-in delay.

I've updated the commitfest page accordingly.

Well, this is a strange conclusion, leaving me slightly bemused.

The discussion between Andrew and I at PGcon concluded that we would
* document which other tools to use
* remove the delay

Now we have rejected the patch which does that, but then re-requested
the exact same thing again.

The patch interprets "remove the delay" as "remove the delay in a way
which will not screw up existing users of pg_standby when they upgrade".
Doing that requires us to have a configurable delay, which defaults to
the current behaviour, but that can be set to zero (the recommended
way). Which is what the patch implements.

Andrew, Heikki: ISTM its time to just make the changes yourselves. This
is just going round and round to no benefit. This doesn't warrant such a
long discussion and review process.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#51

Andrew Dunstan

andrew@dunslane.net

over 17 years ago

In reply to: Simon Riggs (#50)

Re: [HACKERS]odd output in restore mode

Simon Riggs wrote:

Well, this is a strange conclusion, leaving me slightly bemused.

The discussion between Andrew and I at PGcon concluded that we would
* document which other tools to use
* remove the delay

Now we have rejected the patch which does that, but then re-requested
the exact same thing again.

The patch interprets "remove the delay" as "remove the delay in a way
which will not screw up existing users of pg_standby when they upgrade".
Doing that requires us to have a configurable delay, which defaults to
the current behaviour, but that can be set to zero (the recommended
way). Which is what the patch implements.

Andrew, Heikki: ISTM its time to just make the changes yourselves. This
is just going round and round to no benefit. This doesn't warrant such a
long discussion and review process.

You ought to know by now that the length and ferocity of the discussion
bears no relation at all to the importance of the subject ;-)

Personally, I think it's reasonable to provide the delay as long as it's
switchable, although I would have preferred zero to be the default. If
we remove it altogether then we force bigger changes on people who are
currently using Windows copy. But I can live with that since changing
their archive_command is the better path by far anyway, either to use
Gnu cp or the copy / rename trick.

cheers

andrew

#52

Bruce Momjian

bruce@momjian.us

about 17 years ago

In reply to: Andrew Dunstan (#51)

Re: [PATCHES] odd output in restore mode

Have we made any progress on this, namely better documentation and
removing the Win32 delay code?

---------------------------------------------------------------------------

Andrew Dunstan wrote:

Simon Riggs wrote:

Well, this is a strange conclusion, leaving me slightly bemused.

The discussion between Andrew and I at PGcon concluded that we would
* document which other tools to use
* remove the delay

Now we have rejected the patch which does that, but then re-requested
the exact same thing again.

The patch interprets "remove the delay" as "remove the delay in a way
which will not screw up existing users of pg_standby when they upgrade".
Doing that requires us to have a configurable delay, which defaults to
the current behaviour, but that can be set to zero (the recommended
way). Which is what the patch implements.

Andrew, Heikki: ISTM its time to just make the changes yourselves. This
is just going round and round to no benefit. This doesn't warrant such a
long discussion and review process.

You ought to know by now that the length and ferocity of the discussion
bears no relation at all to the importance of the subject ;-)

Personally, I think it's reasonable to provide the delay as long as it's
switchable, although I would have preferred zero to be the default. If
we remove it altogether then we force bigger changes on people who are
currently using Windows copy. But I can live with that since changing
their archive_command is the better path by far anyway, either to use
Gnu cp or the copy / rename trick.

cheers

andrew

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#53

Andrew Dunstan

andrew@dunslane.net

about 17 years ago

In reply to: Bruce Momjian (#52)

Re: [PATCHES] odd output in restore mode

I have a fairly large TODO list, and Simon has thrown in the towel (and
I imagine he also has a large TODO list).

anyone else want to step in?

cheers

andrew

Bruce Momjian wrote:

Show quoted text

Have we made any progress on this, namely better documentation and
removing the Win32 delay code?

---------------------------------------------------------------------------

Andrew Dunstan wrote:

Simon Riggs wrote:

Well, this is a strange conclusion, leaving me slightly bemused.

The discussion between Andrew and I at PGcon concluded that we would
* document which other tools to use
* remove the delay

Now we have rejected the patch which does that, but then re-requested
the exact same thing again.

The patch interprets "remove the delay" as "remove the delay in a way
which will not screw up existing users of pg_standby when they upgrade".
Doing that requires us to have a configurable delay, which defaults to
the current behaviour, but that can be set to zero (the recommended
way). Which is what the patch implements.

Andrew, Heikki: ISTM its time to just make the changes yourselves. This
is just going round and round to no benefit. This doesn't warrant such a
long discussion and review process.

You ought to know by now that the length and ferocity of the discussion
bears no relation at all to the importance of the subject ;-)

Personally, I think it's reasonable to provide the delay as long as it's
switchable, although I would have preferred zero to be the default. If
we remove it altogether then we force bigger changes on people who are
currently using Windows copy. But I can live with that since changing
their archive_command is the better path by far anyway, either to use
Gnu cp or the copy / rename trick.

cheers

andrew

#54

Bruce Momjian

bruce@momjian.us

about 17 years ago

In reply to: Martin Zaun (#29)

Re: [PATCHES] odd output in restore mode

Martin Zaun wrote:

4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before and is not addresses by the
patch: The user-selected Win32 "mklink" command mode is never applied
due to a missing 'break' in CustomizableInitialize():

switch (restoreCommandType)
{
case RESTORE_COMMAND_WIN32_MKLINK:
SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
case RESTORE_COMMAND_WIN32_COPY:
SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
break;

I have added the missing 'break' to CVS HEAD; thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#55

Bruce Momjian

bruce@momjian.us

about 17 years ago

In reply to: Martin Zaun (#29)

1 attachment(s)

Re: [PATCHES] odd output in restore mode

Since this patch was rejected, I have added the attached documentation
to pg_standby to mention the sleep() we do.

---------------------------------------------------------------------------

Martin Zaun wrote:

Below my comments on the CommitFest patch:
pg_standby minor changes for Windows

Simon, I'm sorry you got me, a Postgres newbie, signed up for
reviewing your patch ;)

To start with, I'm not quite sure of the status of this patch
since Bruce's last comment on the -patches alias:

Bruce Momjian wrote:

OK, based on these observations I think we need to learn more about the
issues before making any changes to our code.

From easy to difficult:

1. Issues with applying the patch to CVS HEAD:
The second file in the patch
Index: doc/src/sgml/standby.sgml
appears to be misnamed -- the existing file in HEAD is
Index: doc/src/sgml/pgstandby.sgml
However, still had issues after fixing the file name:

md@Garu:~/pg/pgsql$ patch -c -p0 < ../pg_standby.patch
patching file contrib/pg_standby/pg_standby.c
patching file doc/src/sgml/pgstandby.sgml
Hunk #1 FAILED at 136.
Hunk #2 FAILED at 168.
Hunk #3 FAILED at 245.
Hunk #4 FAILED at 255.
4 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/pgstandby.sgml.rej

2. Missing description for new command-line options in pgstandby.sgml

Simon Riggs wrote:

Patch implements
* recommendation to use GnuWin32 cp on Windows

Saw that in the changes to pgstandby.sgml, and looks ok to me, but:
- no description of the proposed new command-line options -h and -p?

3. No coding style issues seen

Just one comment: the logic that selects the actual restore command to
be used has moved from CustomizableInitialize() to main() -- a matter
of personal taste, perhaps. But in my view the:
+ the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read

4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before and is not addresses by the
patch: The user-selected Win32 "mklink" command mode is never applied
due to a missing 'break' in CustomizableInitialize():

switch (restoreCommandType)
{
case RESTORE_COMMAND_WIN32_MKLINK:
SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
case RESTORE_COMMAND_WIN32_COPY:
SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
break;

A similar behaviour on Non-Win32 platforms where the user-selected
"ln" may be silently changed to "cp" in main():

#if HAVE_WORKING_LINK
restoreCommandType = RESTORE_COMMAND_LN;
#else
restoreCommandType = RESTORE_COMMAND_CP;
#endif

If both Win32/Non-Win32 cases reflect the intended behaviour:
- I'd prefer a code comment in the above case-fall-through,
- suggest a message to the user about the ignored "ln" / "mklink",
- observe that the logic to override of the '-l' option is now in two
places: CustomizableInitialize() and main().

5. Minor wording issue in usage message on new '-p' option

I was wondering if the "always" in the usage text
fprintf(stderr, " -p always uses GNU compatible 'cp' command on all platforms\n");
is too strong, since multiple restore command options overwrite each
other, e.g. "-p -c" applies Windows's "copy" instead of Gnu's "cp".

6. Minor code comment suggestion

Unrelated to this patch, I wonder if the code comments on all four
time-related vars better read "seconds" instead of "amount of time":
int sleeptime = 5; /* amount of time to sleep between file checks */
int holdtime = 0; /* amount of time to wait once file appears full */
int waittime = -1; /* how long we have been waiting, -1 no wait
* yet */
int maxwaittime = 0; /* how long are we prepared to wait for? */

7. Question: benefits of separate holdtime option from sleeptime?

Simon Riggs wrote:

* provide "holdtime" delay, default 0 (on all platforms)

Going back on the hackers+patches emails and parsing the code
comments, I'm sorry if I missed that, but I'm not sure I've understood
the exact tuning benefits that introducing the new holdtime option
provides over using the existing sleeptime, as it's been the case
(just on Win32 only).

8. Unresolved question of implementing now/later a "cp" replacement

Simon Riggs wrote:

On Tue, 2008-07-01 at 13:44 +0300, Heikki Linnakangas wrote:

This seems pretty kludgey to me. I wouldn't want to install GnuWin32
utilities on a production system just for the "cp" command, and I don't
know how I would tune holdtime properly for using "copy". And it seems
risky to have defaults that are known to not work reliably.

How about implementing a replacement function for "cp" ourselves? It
seems pretty trivial to do. We could use that on Unixes as well, which
would keep the differences between Win32 and other platforms smaller,
and thus ensure the codepath gets more testing.

If you've heard complaints about any of this from users, I haven't.
AFAIK we're doing this because it *might* cause a problem. Bear in mind
that link is the preferred performance option, not copy. So AFAICS we're
tuning a secondary option on one specific port, without it being a
raised issue and in an area of code that will be superceded in the next
release.

So further embellishments would be a long way down my own priority list,
putting it politely. Yet I have no objections to the suggestion overall;
we have done that already for alter tablespace.

Don't have much to add to the whether/now/later question of providing
a "cp" replacement, but I guess the existing command-line options and
documentation wouldn't have to change with our own "cp" replacement
while the newly proposed '-h' and '-p' would become moot then, right?

Regards,
Martin

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload

Index: pgstandby.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/pgstandby.sgml,v
retrieving revision 2.5
diff -c -r2.5 pgstandby.sgml
*** pgstandby.sgml	7 May 2008 18:48:40 -0000	2.5
--- pgstandby.sgml	15 Dec 2008 22:04:09 -0000
***************
*** 295,301 ****
    </itemizedlist>
  
    <para>
!    Since the Windows example uses <literal>copy</> at both ends, either
     or both servers might be accessing the archive directory across the
     network.
    </para>
--- 295,310 ----
    </itemizedlist>
  
    <para>
!    The <literal>copy</> command on Windows sets the final file size
!    before the file is completely copied, which would ordinarly confuse
!    <application>pg_standby</application>.  Therefore
!    <application>pg_standby</application> waits <literal>sleeptime</>
!    seconds once it sees the proper file size.  GNUWin32's <literal>cp</>
!    sets the file size only after the file copy is complete.
!   </para>
! 
!   <para>
!    Using the Since the Windows example uses <literal>copy</> at both ends, either
     or both servers might be accessing the archive directory across the
     network.
    </para>

#56

Magnus Hagander

magnus@hagander.net

about 17 years ago

In reply to: Bruce Momjian (#54)

Re: [PATCHES] odd output in restore mode

Bruce Momjian wrote:

Martin Zaun wrote:

4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before and is not addresses by the
patch: The user-selected Win32 "mklink" command mode is never applied
due to a missing 'break' in CustomizableInitialize():

switch (restoreCommandType)
{
case RESTORE_COMMAND_WIN32_MKLINK:
SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
case RESTORE_COMMAND_WIN32_COPY:
SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
break;

I have added the missing 'break' to CVS HEAD; thanks.

Why no backpatch to 8.3? Seems like a clear bugfix to me.

//Magnus

#57

Bruce Momjian

bruce@momjian.us

about 17 years ago

In reply to: Magnus Hagander (#56)

Re: [PATCHES] odd output in restore mode

Magnus Hagander wrote:

Bruce Momjian wrote:

Martin Zaun wrote:

4. Issue: missing break in switch, silent override of '-l' argument?

This behaviour has been in there before and is not addresses by the
patch: The user-selected Win32 "mklink" command mode is never applied
due to a missing 'break' in CustomizableInitialize():

switch (restoreCommandType)
{
case RESTORE_COMMAND_WIN32_MKLINK:
SET_RESTORE_COMMAND("mklink", WALFilePath, xlogFilePath);
case RESTORE_COMMAND_WIN32_COPY:
SET_RESTORE_COMMAND("copy", WALFilePath, xlogFilePath);
break;

I have added the missing 'break' to CVS HEAD; thanks.

Why no backpatch to 8.3? Seems like a clear bugfix to me.

I knew that was going to be asked. At this point I am pulling comments
from rejected patches into CVS commits; these are not even submitted
patches. I am not comfortable backpatching anything when using that
system because obviously no one else even cared enough to submit a patch
for it, let alone test it. If someone wants to batckpatch this or
submit a patch to be backpatched, that is fine.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#58

Bruce Momjian

bruce@momjian.us

about 17 years ago

In reply to: Tom Lane (#49)

Re: [PATCHES] odd output in restore mode

Tom Lane wrote:

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Martin Zaun wrote:

With these avenues to be explored, can the pg_standby patch on the
CommitFest wiki be moved to the "Returned with Feedback" section?

Yes, I think we can conclude that we don't want this patch as it is.
Instead, we want a documentation patch that describes the problem,
mentioning that GNU cp is safe, or you can use the copy+rename trick.

Right, after which we remove the presently hacked-in delay.

I've updated the commitfest page accordingly.

I have documented the sleep() call and that GNU cp is safe, but did not
remove the delay, nor mention copy+rename.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#59

Simon Riggs

simon@2ndQuadrant.com

about 17 years ago

In reply to: Bruce Momjian (#57)

Re: [PATCHES] odd output in restore mode

On Mon, 2008-12-15 at 17:10 -0500, Bruce Momjian wrote:

Why no backpatch to 8.3? Seems like a clear bugfix to me.

I knew that was going to be asked.

8.3 is really where this is needed. 8.4 has almost no need of this.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support