pg_standby for 8.2 (with last restart point)
Hi all,
For PG versions < 8.3 (specifically 8.2) I wanted the %r parameter to be
substituted by the last restart point, just as the recovery code does in >
8.3. I assumed there would be objections to it (else it would have already
been there in 8.2.x), so started looking for workarounds. After a few ideas,
I settled with using the output of pg_controldata.
Here's what I have done: I execute pg_controldata and parse it's output
to extract the same information as xlog.c provides for %r in versions > 8.3.
Then I rebuild the XLog filename, just like xlog.c, and emit it from the
script. All this is done in a perl script (at the end of this mail).
My next step is: use this script in the restore_command to provide the
%r parameter to pg_standby, like so:
restore_command = 'pg_standby -c -d -s 5 -w 0 -t
/tmp/pg_standby.trigger.5433 ../wal_archive/ %f %p `perl
/home/gurjeet/dev/last_restart_point.pl` 2>> pg_standby.log'
I have tested this script using the following restore_command, on a HEAD
version:
restore_command = 'echo before `perl
/home/gurjeet/dev/last_restart_point.pl` >> pg_standby.log && pg_standby -c
-d -s 5 -w 0 -t /tmp/pg_standby.trigger.5433 ../wal_archive/ %f %p %r 2>>
pg_standby.log && echo after `perl /home/gurjeet/dev/last_restart_point.pl`
pg_standby.log'
Using the above restore_command, I can see that my script is able to
detect the change in the restart point (%r) just as soon as the server
updates it. Here's a snippet:
<snip>
...
Keep archive history : 000000010000000100000021 and later
running restore : OK
after 000000010000000100000021
before 000000010000000100000045
Trigger file : /tmp/pg_standby.trigger.5433
Waiting for WAL file : 000000010000000100000047
WAL file path : ../wal_archive//000000010000000100000047
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : cp "../wal_archive//000000010000000100000047"
"pg_xlog/RECOVERYXLOG"
Keep archive history : 000000010000000100000045 and later
running restore : OK
removing "../wal_archive//000000010000000100000025"
removing "../wal_archive//00000001000000010000002D"
removing "../wal_archive//000000010000000100000031"
...
<./snip>
So, is this a safe way of extracting the last restart point for PG < 8.3?
Or would it be possible to make PG<8.3 provide this %r through some patch?
Best regards,
Gurjeet.
Here's the perl script:
<script>
my @text = `pg_controldata .`; # here . represents the PGDATA, since the
server is executing here.
my $line;
my $time_line_id;
my $redo_log_id;
my $redo_rec_off;
my $wal_seg_bytes;
my $redo_log_seg;
foreach $line ( @text )
{
$line = mychomp( $line );
if( $line =~ m/Latest checkpoint's TimeLineID:\s*(([0-9])+)/ )
{
# decimal number
$time_line_id = 0 + $1;
}
if( $line =~ m/Latest checkpoint's REDO
location:\s*(([0-9]|[A-F])+)\/(([0-9]|[A-F])+)/ )
{
# hexadecimal numbers
$redo_log_id = $1;
$redo_rec_off = $3;
}
if( $line =~ m/Bytes per WAL segment:\s*([0-9]+)/ )
{
# decimal number
$wal_seg_bytes = $1;
}
}
$redo_log_seg = sprintf( "%d", hex( $redo_rec_off ) / $wal_seg_bytes );
print "" . sprintf( "%08X%08X%08X", $time_line_id, $redo_log_id,
$redo_log_seg ) . "\n";
# Wrapper around Perl's chomp function
sub mychomp
{
my ( $tmp ) = @_;
chomp( $tmp );
return $tmp;
}
</script>
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
On Fri, 28 Mar 2008, Gurjeet Singh wrote:
For PG versions < 8.3 (specifically 8.2) I wanted the %r parameter to be
substituted by the last restart point, just as the recovery code does in8.3. I assumed there would be objections to it (else it would have
already been there in 8.2.x)
The idea to add this feature didn't show up before 8.2 was released, it
came up during the 8.3 development cycle. This project doesn't make
functional changes to stable releases, that's the reason why 8.2 will
never get patched to add the %r feature.
Cute script though. I know people have asked about simulating this
behavior, and I don't recall a good sample solution being presented before
yours.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, Mar 28, 2008 at 3:56 AM, Greg Smith <gsmith@gregsmith.com> wrote:
On Fri, 28 Mar 2008, Gurjeet Singh wrote:
For PG versions < 8.3 (specifically 8.2) I wanted the %r parameter to be
substituted by the last restart point, just as the recovery code does in8.3. I assumed there would be objections to it (else it would have
already been there in 8.2.x)
The idea to add this feature didn't show up before 8.2 was released, it
came up during the 8.3 development cycle. This project doesn't make
functional changes to stable releases, that's the reason why 8.2 will
never get patched to add the %r feature.
I completely understand that, but still was hoping that we'd change that.
Cute script though. I know people have asked about simulating this
behavior, and I don't recall a good sample solution being presented before
yours.
Thanks.
Is there anybody who sees any problem with this? Specifically, internals
wise, does 8.2 also give the same guarantee 8.3 does w.r.t restart point?
And consequently, is it safe to go ahead with this script in a production
environment?
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
On Fri, 28 Mar 2008, Gurjeet Singh wrote:
This project doesn't make functional changes to stable releases, that's
the reason why 8.2 will never get patched to add the %r feature.I completely understand that, but still was hoping that we'd change that.
Well, then you really don't understand this at all then, so let's work on
that for a bit. http://www.postgresql.org/support/versioning is the
official statement, perhaps some examples will help clarify where and why
the line is where it is.
One of the first patches I ever submitted made a minor change to a contrib
utility used solely for benchmarking (pgbench) that added a useful
feature, only if you passed it the right parameter. That was considered
for a tiny bit before being rejected as a feature change too large to put
into a stable branch.
That was a small change in a utility that should never be run on a
production system. You're trying to get a change made to the code path
people rely on for their *backups*. Good luck with that.
The parable I enjoy pulling out in support of this policy is MySQL bug
#31001:
http://www.mysqlperformanceblog.com/2007/10/04/mysql-quality-of-old-and-new-features/
where they added a seemingly minor optimization to something and
accidentally broke the ability to sort in some cases. There's always a
small risk that comes with any code change, and this is why you don't ever
touch working production code unless you're fixing a bug that's more
troublesome than that risk.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes:
... That was a small change in a utility that should never be run on a
production system. You're trying to get a change made to the code path
people rely on for their *backups*. Good luck with that.
While I quite agree with Greg's comments about not changing stable
release branches unnecessarily, it seems that there's another
consideration in this case. If we don't back-patch %r then users
will have to rely on hacky scripts like the one posted upthread.
Is that really a net gain in reliability?
(I'm honestly not sure of the answer; I'm just thinking it might
be open to debate. In particular I don't remember how complicated
the patch to add %r was.)
regards, tom lane
On Fri, Mar 28, 2008 at 9:47 AM, Greg Smith <gsmith@gregsmith.com> wrote:
On Fri, 28 Mar 2008, Gurjeet Singh wrote:
This project doesn't make functional changes to stable releases, that's
the reason why 8.2 will never get patched to add the %r feature.I completely understand that, but still was hoping that we'd change
that.
Well, then you really don't understand this at all then, so let's work on
that for a bit. http://www.postgresql.org/support/versioning is the
official statement, perhaps some examples will help clarify where and why
the line is where it is.One of the first patches I ever submitted made a minor change to a contrib
utility used solely for benchmarking (pgbench) that added a useful
feature, only if you passed it the right parameter. That was considered
for a tiny bit before being rejected as a feature change too large to put
into a stable branch.That was a small change in a utility that should never be run on a
production system. You're trying to get a change made to the code path
people rely on for their *backups*. Good luck with that.The parable I enjoy pulling out in support of this policy is MySQL bug
#31001:http://www.mysqlperformanceblog.com/2007/10/04/mysql-quality-of-old-and-new-features/
where they added a seemingly minor optimization to something and
accidentally broke the ability to sort in some cases. There's always a
small risk that comes with any code change, and this is why you don't ever
touch working production code unless you're fixing a bug that's more
troublesome than that risk.
Point well taken. And when I said 'I completely understand that', I meant I
understood Postgres' policy for patching older releases. And thanks for the
links; it feels good to know that there's an "official" stand on this topic
in Postgres, rather than 'no known serious bugs'. :)
I am still looking for comments on the correctness of this script and above
mentioned procedure for running it on an 8.2.x release.
Thanks and best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
On Fri, 28 Mar 2008, Tom Lane wrote:
While I quite agree with Greg's comments about not changing stable
release branches unnecessarily, it seems that there's another
consideration in this case.
I was just trying to set Gurjeet's expectations appropriately while taking
the suggestion seriously anyway. This is one of those cases where you
could argue that since there is no good way to find out what to do here,
it's an operational bug serious enough to patch.
In particular I don't remember how complicated the patch to add %r was
Unfortunately it was mixed in with the archive_mode GUC and
log_startpoints changes so the diff is more complicated than just that:
http://repo.or.cz/w/PostgreSQL.git?a=commit;h=a14266760bd403abd38be499de1619a825e0438b
A quick glance suggests this part might only be 14 lines added to xlog.c
though (the 11 lines starting with "case 'r'" and the 3 new variable
definitions just above it that uses).
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 2008-03-28 at 00:54 -0400, Tom Lane wrote:
Greg Smith <gsmith@gregsmith.com> writes:
... That was a small change in a utility that should never be run on a
production system. You're trying to get a change made to the code path
people rely on for their *backups*. Good luck with that.While I quite agree with Greg's comments about not changing stable
release branches unnecessarily, it seems that there's another
consideration in this case. If we don't back-patch %r then users
will have to rely on hacky scripts like the one posted upthread.
Is that really a net gain in reliability?(I'm honestly not sure of the answer; I'm just thinking it might
be open to debate. In particular I don't remember how complicated
the patch to add %r was.)
Here's the original patch, edited to remove pg_standby changes.
I've not even checked whether it will apply, but it seems fairly simple.
Gurjeet, would you like to freshen and test that up for apply to 8.2?
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Attachments:
last_restart_point.v2.patchtext/x-patch; charset=utf-8; name=last_restart_point.v2.patchDownload
Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.97
diff -c -r2.97 backup.sgml
*** doc/src/sgml/backup.sgml 1 Feb 2007 00:28:16 -0000 2.97
--- doc/src/sgml/backup.sgml 9 Feb 2007 23:05:26 -0000
***************
*** 526,531 ****
--- 526,536 ----
archive, while any <literal>%f</> is replaced by the file name only.
(The path name is relative to the working directory of the server,
i.e., the cluster's data directory.)
+ A <literal>%r</> will be replaced by the name of the file containing
+ the last valid restartpoint. This is the earliest file that must be
+ kept to allow a restore to be restartable, so this information can
+ allow if the archive to be truncated to just the minimum required
+ to support restart of the current restore.
Write <literal>%%</> if you need to embed an actual <literal>%</>
character in the command. The simplest useful command is something
like:
***************
*** 922,927 ****
--- 927,937 ----
which is replaced by the path name to copy the log file to.
(The path name is relative to the working directory of the server,
i.e., the cluster's data directory.)
+ A <literal>%r</> will be replaced by the name of the file containing
+ the last valid restartpoint. This is the earliest file that must be
+ kept to allow a restore to be restartable, so this information can
+ allow if the archive to be truncated to just the minimum required
+ to support restart of the current restore.
Write <literal>%%</> if you need to embed an actual <literal>%</>
character in the command. The simplest useful command is
something like:
***************
*** 1008,1020 ****
<listitem>
<para>
The shell command to execute to retrieve an archived segment of
! the WAL file series. This parameter is required.
! Any <literal>%f</> in the string is
! replaced by the name of the file to retrieve from the archive,
! and any <literal>%p</> is replaced by the path name to copy
! it to on the server.
! (The path name is relative to the working directory of the server,
! i.e., the cluster's data directory.)
Write <literal>%%</> to embed an actual <literal>%</> character
in the command.
</para>
--- 1018,1033 ----
<listitem>
<para>
The shell command to execute to retrieve an archived segment of
! the WAL file series. This parameter is required. Any <literal>%f</>
! in the string is replaced by the name of the file to retrieve from
! the archive, and any <literal>%p</> is replaced by the path name to
! copy it to on the server. (The path name is relative to the working
! directory of the server, i.e., the cluster's data directory.)
! A <literal>%r</> will be replaced by the name of the file containing
! the last valid restartpoint. This is the earliest file that must be
! kept to allow a restore to be restartable, so this information can
! allow if the archive to be truncated to just the minimum required
! to support restart of the current restore.
Write <literal>%%</> to embed an actual <literal>%</> character
in the command.
</para>
***************
*** 1024,1031 ****
names that are not present in the archive; it must return nonzero
when so asked. Examples:
<programlisting>
! restore_command = 'cp /mnt/server/archivedir/%f "%p"'
! restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows
</programlisting>
</para>
</listitem>
--- 1037,1044 ----
names that are not present in the archive; it must return nonzero
when so asked. Examples:
<programlisting>
! restore_command = 'cp /mnt/server/archivedir/%f "%p" "%r"'
! restore_command = 'copy /mnt/server/archivedir/%f "%p" "%r"' # Windows
</programlisting>
</para>
</listitem>
***************
*** 1374,1380 ****
contact between the two database servers is the archive of WAL files
that both share: primary writing to the archive, standby reading from
the archive. Care must be taken to ensure that WAL archives for separate
! primary servers do not become mixed together or confused.
</para>
<para>
--- 1387,1394 ----
contact between the two database servers is the archive of WAL files
that both share: primary writing to the archive, standby reading from
the archive. Care must be taken to ensure that WAL archives for separate
! primary servers do not become mixed together or confused. The archive
! need not be large, if it is only required to the standby operation.
</para>
<para>
***************
*** 1434,1439 ****
--- 1448,1466 ----
as the explicit creation of a trigger file is less error prone, if
this can be arranged.
</para>
+
+ <para>
+ A working version of a wait-for <varname>restore_command</> is provided
+ as a contrib module, known as <applications>pg_standby</>. This can be
+ extended as needed to support specific configurations or environments.
+ </para>
+
+ <para>
+ The size of the WAL archive can be minimised by using the <literal>%r</>
+ option of the <varname>restore_command</>. This option specifies the
+ last archive filename that needs to be kept to allow the recovery to
+ restart correctly. This can be used to truncate the archive once
+ files are no longer required.
</sect2>
<sect2 id="warm-standby-config">
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.263
diff -c -r1.263 xlog.c
*** src/backend/access/transam/xlog.c 8 Feb 2007 11:10:27 -0000 1.263
--- src/backend/access/transam/xlog.c 9 Feb 2007 23:05:34 -0000
***************
*** 190,195 ****
--- 190,196 ----
static TransactionId recoveryStopXid;
static time_t recoveryStopTime;
static bool recoveryStopAfter;
+ static char lastRestorepointFname[MAXFNAMELEN];
/*
* During normal operation, the only timeline we care about is ThisTimeLineID.
***************
*** 2359,2364 ****
--- 2360,2366 ----
{
char xlogpath[MAXPGPATH];
char xlogRestoreCmd[MAXPGPATH];
+ char lastRestartPointFname[MAXPGPATH];
char *dp;
char *endp;
const char *sp;
***************
*** 2437,2442 ****
--- 2439,2454 ----
StrNCpy(dp, xlogfname, endp - dp);
dp += strlen(dp);
break;
+ case 'r':
+ /* %r: filename of last restartpoint */
+ sp++;
+ XLogFileName(lastRestartPointFname,
+ ControlFile->checkPointCopy->ThisTimelineId,
+ ControlFile->checkPointCopy->redo.xlogid,
+ ControlFile->checkPointCopy->redo.xrecoff);
+ StrNCpy(dp, lastRestartPointFname, endp - dp);
+ dp += strlen(dp);
+ break;
case '%':
/* convert %% to a single % */
sp++;
On Fri, Mar 28, 2008 at 10:30 AM, Gurjeet Singh <singh.gurjeet@gmail.com>
wrote:
I am still looking for comments on the correctness of this script and above
mentioned procedure for running it on an 8.2.x release.
Well, I came across a serious bug in the script. Here's the corrected
version of the perl script for anyone who might have picked up the script
from here:
(Am still looking for any feedback on the correctness of the script;
especially in the light of the recent possible bugs fixed on the server
side, related to recovery)
<script>
my @text = `pg_controldata .`; # here . represents the PGDATA, since the
server is executing here.
my $line;
my $time_line_id;
my $redo_log_id;
my $redo_rec_off;
my $wal_seg_bytes;
my $redo_log_seg;
foreach $line ( @text )
{
$line = mychomp( $line );
if( $line =~ m/Latest checkpoint's TimeLineID:\s*(([0-9])+)/ )
{
# decimal number
$time_line_id = 0 + $1;
}
if( $line =~ m/Latest checkpoint's REDO
location:\s*(([0-9]|[A-F])+)\/(([0-9]|[A-F])+)/
)
{
# hexadecimal numbers
$redo_log_id = hex( $1 );
$redo_rec_off = hex( $3 );
}
if( $line =~ m/Bytes per WAL segment:\s*([0-9]+)/ )
{
# decimal number
$wal_seg_bytes = $1;
}
}
$redo_log_seg = sprintf( "%d", $redo_rec_off / $wal_seg_bytes );
print "" . sprintf( "%08X%08X%08X", $time_line_id, $redo_log_id,
$redo_log_seg ) . "\n";
# Wrapper around Perl's chomp function
sub mychomp
{
my ( $tmp ) = @_;
chomp( $tmp );
return $tmp;
}
</script>
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device