Command to prune archive at restartpoints
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.
This was discussed at
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
other things.
Per discussion, attached patch adds a new restartpoint_command option to
recovery.conf. That's an external shell command just like
recovery_end_command that's executed at every restartpoint. You can use
the %r parameter to pass the filename of the oldest WAL file that needs
to be retained.
While developing this I noticed that %r in recovery_end_command is not
working correctly:
LOG: redo done at 0/14000C10
LOG: last completed transaction was at log time 2000-01-01
02:21:08.816445+02
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/000000010000000000000014': No
such file or directory
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/00000002.history': No such file
or directory
LOG: selected new timeline ID: 2
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/00000001.history': No such file
or directory
LOG: archive recovery complete
LOG: checkpoint starting: end-of-recovery immediate wait
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
total=0.003 s
LOG: executing recovery_end_command "echo recovery_end_command %r"
recovery_end_command 000000000000000000000000
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Note how %r is always expanded to 000000000000000000000000. That's
because %r is expanded only when InRedo is true, which makes sense for
restore_command where that piece of code was copy-pasted from, but it's
never true anymore when recovery_end_command is run. The attached patch
fixes that too.
Barring objections, I will commit this later today.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
restartpoint_command-1.patchtext/x-diff; name=restartpoint_command-1.patchDownload+89-32
On Wed, Mar 17, 2010 at 9:37 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.
I'm still finding this kind of narrow-minded. I'm picturing a system
with multiple replicas -- obvious no one replica can take it upon
itself to delete archived log files based only on its own
restartpoint. And besides, if you're using the archived log files for
backups you also need to take into account the backup policy and only
delete files that aren't needed for a consistent backup and aren't
needed for the replica.
What we need is a program which can take all this information from all
your slaves and backup labels into account and implement your backup
policies. It probably won't exist in time for the release and in any
case doesn't really have to ship with Postgres. There might even be
more than one.
But do we have all the information that such a program would need? Is
there a way to connect to a replica and ask it what the restart point
is? I suppose with this new command you could always just make it a
command which wakes up this demon and sends it the restart point and
the replica id and it can update its internal state and recalculate
what archives are needed. It is a bit nerve-wracking that it's
dependent on its internal state remembering the restart points it's
been given though.
--
greg
Greg Stark wrote:
On Wed, Mar 17, 2010 at 9:37 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.I'm still finding this kind of narrow-minded. I'm picturing a system
with multiple replicas -- obvious no one replica can take it upon
itself to delete archived log files based only on its own
restartpoint. And besides, if you're using the archived log files for
backups you also need to take into account the backup policy and only
delete files that aren't needed for a consistent backup and aren't
needed for the replica.
That's why we provide options that take any shell command you want,
rather than e.g a path to an archive directory that's pruned automatically.
For example, if you have multiple standbys sharing one archive, you
could do something like this:
In each standby, have a restartpoint_command along the lines of:
"echo %r > <archivedirectory>/standby1_location; archive_cleanup.sh"
Where '1' is different for every standby
and in archive_cleanup.sh, scan through all the standbyX_location files,
take the minimum, and delete all files smaller than that.
You'll need some care with locking etc., but the point is that the
current hooks allow you to implement complex setups like that.
What we need is a program which can take all this information from all
your slaves and backup labels into account and implement your backup
policies. It probably won't exist in time for the release and in any
case doesn't really have to ship with Postgres. There might even be
more than one.
I guess I just described such a program :-). Yeah, I'd imagine that to
become part of toolkits like skytools.
But do we have all the information that such a program would need? Is
there a way to connect to a replica and ask it what the restart point
is?
Hmm, Greg Smith opened a thread on exposing the fields in the control
file as user-defined functions. IIRC last restartpoint location was the
piece of information that triggered the discussion this time. Perhaps we
should indeed add a function to expose that in 9.0.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Committed.
Heikki Linnakangas wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.
This was discussed at
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
other things.Per discussion, attached patch adds a new restartpoint_command option to
recovery.conf. That's an external shell command just like
recovery_end_command that's executed at every restartpoint. You can use
the %r parameter to pass the filename of the oldest WAL file that needs
to be retained.While developing this I noticed that %r in recovery_end_command is not
working correctly:LOG: redo done at 0/14000C10
LOG: last completed transaction was at log time 2000-01-01
02:21:08.816445+02
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/000000010000000000000014': No
such file or directory
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/00000002.history': No such file
or directory
LOG: selected new timeline ID: 2
cp: cannot stat
`/home/hlinnaka/pgsql.cvshead/walarchive/00000001.history': No such file
or directory
LOG: archive recovery complete
LOG: checkpoint starting: end-of-recovery immediate wait
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
total=0.003 s
LOG: executing recovery_end_command "echo recovery_end_command %r"
recovery_end_command 000000000000000000000000
LOG: database system is ready to accept connections
LOG: autovacuum launcher startedNote how %r is always expanded to 000000000000000000000000. That's
because %r is expanded only when InRedo is true, which makes sense for
restore_command where that piece of code was copy-pasted from, but it's
never true anymore when recovery_end_command is run. The attached patch
fixes that too.Barring objections, I will commit this later today.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.
This was discussed at
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
other things.
...
Barring objections, I will commit this later today.
Would it be better to call this "archive_cleanup_command"? That might
help people understand the need for and the use of this parameter.
--
Simon Riggs www.2ndQuadrant.com
On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.Would it be better to call this "archive_cleanup_command"? That might
help people understand the need for and the use of this parameter.
This is bikeshedding but fwiw I like Simon's suggestion.
--
greg
On Mon, Mar 22, 2010 at 11:58 AM, Greg Stark <gsstark@mit.edu> wrote:
On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.Would it be better to call this "archive_cleanup_command"? That might
help people understand the need for and the use of this parameter.This is bikeshedding but fwiw I like Simon's suggestion.
So, this thread is hanging out on our list of open items for 9.0. My
personal opinion on it is that I don't really care much one way or the
other. archive_cleanup_command does seem easier to understand, but
restartpoint_command has the advantage of describing exactly when it
gets run from a technical perspective, which might be a good thing,
too. Since nobody's felt motivated to do anything about this for two
and a half months and we've now been through two betas with it the way
it is, I'm inclined to say we should just leave it alone. On the
other hand, both of the people who voted in favor of changing it are
committers, and if one of them feels like putting in the effort to
change it, it won't bother me much, except that I feel it should get
done RSN. But one way or the other we need to make a decision and get
this off the list.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Tue, 2010-06-08 at 17:17 -0400, Robert Haas wrote:
On Mon, Mar 22, 2010 at 11:58 AM, Greg Stark <gsstark@mit.edu> wrote:
On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.Would it be better to call this "archive_cleanup_command"? That might
help people understand the need for and the use of this parameter.This is bikeshedding but fwiw I like Simon's suggestion.
So, this thread is hanging out on our list of open items for 9.0. My
personal opinion on it is that I don't really care much one way or the
other. archive_cleanup_command does seem easier to understand, but
restartpoint_command has the advantage of describing exactly when it
gets run from a technical perspective, which might be a good thing,
too. Since nobody's felt motivated to do anything about this for two
and a half months and we've now been through two betas with it the way
it is, I'm inclined to say we should just leave it alone. On the
other hand, both of the people who voted in favor of changing it are
committers, and if one of them feels like putting in the effort to
change it, it won't bother me much, except that I feel it should get
done RSN. But one way or the other we need to make a decision and get
this off the list.
Yes, restartpoint_command is exactly correct, and I do understand it; I
just don't think anyone else will. If there's another use for a
restartpoint_command other than for clearing up an archive, then it
would be sufficient to destroy the name change idea.
--
Simon Riggs www.2ndQuadrant.com
Robert Haas wrote:
On Mon, Mar 22, 2010 at 11:58 AM, Greg Stark <gsstark@mit.edu> wrote:
On Thu, Mar 18, 2010 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:
One awkward omission in the new built-in standby mode, mainly used for
streaming replication, is that there is no easy way to delete old
archived files like you do with the %r parameter to restore_command.Would it be better to call this "archive_cleanup_command"? That might
help people understand the need for and the use of this parameter.This is bikeshedding but fwiw I like Simon's suggestion.
So, this thread is hanging out on our list of open items for 9.0. My
personal opinion on it is that I don't really care much one way or the
other. archive_cleanup_command does seem easier to understand, but
restartpoint_command has the advantage of describing exactly when it
gets run from a technical perspective, which might be a good thing,
too. Since nobody's felt motivated to do anything about this for two
and a half months and we've now been through two betas with it the way
it is, I'm inclined to say we should just leave it alone. On the
other hand, both of the people who voted in favor of changing it are
committers, and if one of them feels like putting in the effort to
change it, it won't bother me much, except that I feel it should get
done RSN. But one way or the other we need to make a decision and get
this off the list.
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.
More importantly, we should include an example in the docs. I created
one the other day when this was actually bothering me a bit (see
<http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html>).
That seemed to work ok, but maybe it's too long, and maybe people would
prefer a shell script to perl.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.
Weak preference for archive_cleanup_command here.
More importantly, we should include an example in the docs. I created
one the other day when this was actually bothering me a bit (see
<http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html>).
That seemed to work ok, but maybe it's too long, and maybe people would
prefer a shell script to perl.
Short is good. Maybe you could remove the logging stuff from the
example.
As for the language choice, my first thought is +1 for perl over shell,
mainly because it might be directly useful to people on Windows while
shell never would be. On the other hand, if it's possible to do a
useful one-liner in shell then let's do it that way.
regards, tom lane
Tom Lane wrote:
As for the language choice, my first thought is +1 for perl over shell,
mainly because it might be directly useful to people on Windows while
shell never would be. On the other hand, if it's possible to do a
useful one-liner in shell then let's do it that way.
I don't think it is, reasonably. But here is fairly minimal version that
might suit the docs:
use strict;
my ($dir, $num) = @ARGV;
foreach my $file (glob("$dir/*"))
{
my $name = basename($file);
unlink $file if (-f $file && $name =~ /^[0-9A-Z]{24}$/ && $name lt $num);
}
cheers
andrew
On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.Weak preference for archive_cleanup_command here.
OK, sounds like we have consensus on that. Who wants to do it?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.Weak preference for archive_cleanup_command here.
OK, sounds like we have consensus on that. Who wants to do it?
Do we just need to replace all of them? If so, patch attached.
I replaced 3 terms: recovery_end_command, recovery-end-command,
and recoveryEndCommand.
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
Attachments:
archive_cleanup_command.patchapplication/octet-stream; name=archive_cleanup_command.patchDownload+22-22
On Tue, Jun 8, 2010 at 9:45 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.Weak preference for archive_cleanup_command here.
OK, sounds like we have consensus on that. Who wants to do it?
Do we just need to replace all of them? If so, patch attached.
I replaced 3 terms: recovery_end_command, recovery-end-command,
and recoveryEndCommand.
I think we're replacing restartpoint_command, not recovery_end_command.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jun 9, 2010 at 10:45 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 8, 2010 at 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.Weak preference for archive_cleanup_command here.
OK, sounds like we have consensus on that. Who wants to do it?
Do we just need to replace all of them? If so, patch attached.
I replaced 3 terms: recovery_end_command, recovery-end-command,
and recoveryEndCommand.
s/recovery_end_command/restartpoint_command?
I prefer restartpoint_command over archive_cleanup_command because
not only restartpoint_command but also recovery_end_command is used
for archive cleanup.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Robert Haas <robertmhaas@gmail.com> wrote:
I think we're replacing restartpoint_command, not recovery_end_command.
Ah, sorry. I did the same replacement for restartpoint_command
in _, -, and camel case words.
BTW, should we also have a release note for the command?
I added a simple description for it in the patch.
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
Attachments:
estartpoint-to-archive_cleanup.patchapplication/octet-stream; name=estartpoint-to-archive_cleanup.patchDownload+34-25
On Tue, Jun 8, 2010 at 10:18 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
I think we're replacing restartpoint_command, not recovery_end_command.
Ah, sorry. I did the same replacement for restartpoint_command
in _, -, and camel case words.
Gah. Perhaps one of these days we will stop spelling every identifier
in multiple different ways.
BTW, should we also have a release note for the command?
I added a simple description for it in the patch.
Yeah, it should be definitely mentioned in the release notes somewhere, I think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, 2010-06-09 at 11:18 +0900, Takahiro Itagaki wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
I think we're replacing restartpoint_command, not recovery_end_command.
Ah, sorry. I did the same replacement for restartpoint_command
in _, -, and camel case words.BTW, should we also have a release note for the command?
I added a simple description for it in the patch.
I don't think so, its not a separate feature. It's a change as part of
SR.
--
Simon Riggs www.2ndQuadrant.com
On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.More importantly, we should include an example in the docs. I created
one the other day when this was actually bothering me a bit (see
<http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html>).
That seemed to work ok, but maybe it's too long, and maybe people would
prefer a shell script to perl.
I submitted a patch to make the command "pg_standby -a %r"
That's a more portable solution, ISTM.
I'll commit that and fix the docs.
--
Simon Riggs www.2ndQuadrant.com
On 09/06/10 10:21, Simon Riggs wrote:
On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote:
I prefer archive_cleanup_command. We should name things after their
principal function, not an implementation detail, IMNSHO.More importantly, we should include an example in the docs. I created
one the other day when this was actually bothering me a bit (see
<http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html>).
That seemed to work ok, but maybe it's too long, and maybe people would
prefer a shell script to perl.I submitted a patch to make the command "pg_standby -a %r"
That's a more portable solution, ISTM.
I'll commit that and fix the docs.
Huh, wait. There's no -a option in pg_standby, so I presume you're
planning to add that too. I don't like confusing pg_standby into this,
the docs are currently quite clear that if you want to use the built-in
standby mode, you can't use pg_standby, and this would muddy the waters.
Maybe we could add a new pg_cleanuparchive binary, but we'll need some
discussion...
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com