Review of pg_archivecleanup -x option patch
Hello,
This is my first patch review ever, so please bear with me.
The patch[1]http://archives.postgresql.org/pgsql-hackers/2011-02/msg00547.php is in the context diff format and applies cleanly to
current git HEAD. Documentation is updated by the patch.
The code does implement what's the patch is supposed to do.
Do we want that? According to Greg's original mail, yes.
One problem I've found while trying the example workflow is this:
~/local/postgresql/HEAD$ ./bin/pg_archivecleanup -d -x .gz data/archive/ 000000010000000000000002.00000020.backup.gz
pg_archivecleanup: invalid filename input
Try "pg_archivecleanup --help" for more information.
I think we could accept the suffixed WAL filename and strip the suffix;
another idea is to actually detect the suffix (if any) from the last
command line argument, thus rendering the -x option unnecessary.
By the way, I for one would use the word 'suffix' instead of 'extension'
which sounds like some MS-DOS thingy, but after briefly grepping the
docs I can see that both words are used in this context already (and the
ratio is not in favor of my choice of wording.)
Other than that the patch looks good.
--
Alex
[1]: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00547.php
On 02/01/2012 07:53 AM, Alex Shulgin wrote:
One problem I've found while trying the example workflow is this:
~/local/postgresql/HEAD$ ./bin/pg_archivecleanup -d -x .gz data/archive/ 000000010000000000000002.00000020.backup.gz
pg_archivecleanup: invalid filename input
Try "pg_archivecleanup --help" for more information.I think we could accept the suffixed WAL filename and strip the suffix;
another idea is to actually detect the suffix (if any) from the last
command line argument, thus rendering the -x option unnecessary.
Going full-on automatic with detecting the suffix seems like it might do
the wrong thing when presented with a strange extension. I know a lot
of the time I use pg_archivecleanup is inside scripts that tend to get
setup and then ignored. If one of those has a bug, or unexpected data
appears in the archive, I'd rather see this sort of major, obvious
failure--not an attempt to sort things out that might make a bad decision.
The smaller step of automatically stripping the specified suffix from
the input file name, when it matches the one we've told the program to
expect, seems like a usability improvement unlikely to lead to bad
things though. I've certainly made the mistake you've shown when using
the patched version of the program myself, just didn't think about
catching and correcting it myself before. We can rev this to add that
feature and resubmit easily enough, will turn that around soon.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
On Sun, Feb 12, 2012 at 9:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
The smaller step of automatically stripping the specified suffix from the
input file name, when it matches the one we've told the program to expect,
seems like a usability improvement unlikely to lead to bad things though.
I've certainly made the mistake you've shown when using the patched version
of the program myself, just didn't think about catching and correcting it
myself before. We can rev this to add that feature and resubmit easily
enough, will turn that around soon.
This change seems plenty small enough to slip in even at this late
date, but I guess we're lacking a new version of the patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Mar 8, 2012 at 8:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 12, 2012 at 9:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
The smaller step of automatically stripping the specified suffix from the
input file name, when it matches the one we've told the program to expect,
seems like a usability improvement unlikely to lead to bad things though.
I've certainly made the mistake you've shown when using the patched version
of the program myself, just didn't think about catching and correcting it
myself before. We can rev this to add that feature and resubmit easily
enough, will turn that around soon.This change seems plenty small enough to slip in even at this late
date, but I guess we're lacking a new version of the patch.
Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :D
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
On Fri, Mar 9, 2012 at 12:46 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Thu, Mar 8, 2012 at 8:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 12, 2012 at 9:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
The smaller step of automatically stripping the specified suffix from the
input file name, when it matches the one we've told the program to expect,
seems like a usability improvement unlikely to lead to bad things though.
I've certainly made the mistake you've shown when using the patched version
of the program myself, just didn't think about catching and correcting it
myself before. We can rev this to add that feature and resubmit easily
enough, will turn that around soon.This change seems plenty small enough to slip in even at this late
date, but I guess we're lacking a new version of the patch.Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :D
with the patch this time
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Attachments:
archivecleanup_extensions-v3.patchtext/x-patch; charset=US-ASCII; name=archivecleanup_extensions-v3.patchDownload
diff --git a/contrib/pg_archivecleanup/pg_archivecleanup.c b/contrib/pg_archivecleanup/pg_archivecleanup.c
new file mode 100644
index adfc83d..8f6ca2c
*** a/contrib/pg_archivecleanup/pg_archivecleanup.c
--- b/contrib/pg_archivecleanup/pg_archivecleanup.c
*************** const char *progname;
*** 37,42 ****
--- 37,43 ----
/* Options and defaults */
bool debug = false; /* are we debugging? */
bool dryrun = false; /* are we performing a dry-run operation? */
+ char *additional_ext = NULL; /* Extension to remove from filenames */
char *archiveLocation; /* where to find the archive? */
char *restartWALFileName; /* the file from which we can restart restore */
*************** static void
*** 94,106 ****
--- 95,136 ----
CleanupPriorWALFiles(void)
{
int rc;
+ int chop_at;
DIR *xldir;
struct dirent *xlde;
+ char walfile[MAXPGPATH];
if ((xldir = opendir(archiveLocation)) != NULL)
{
while ((xlde = readdir(xldir)) != NULL)
{
+ strncpy(walfile, xlde->d_name, MAXPGPATH);
+ /*
+ * Remove any specified additional extension from the filename
+ * before testing it against the conditions below.
+ */
+ if (additional_ext)
+ {
+ chop_at = strlen(walfile) - strlen(additional_ext);
+ /*
+ * Only chop if this is long enough to be a file name and the
+ * extension matches.
+ */
+ if ((chop_at >= (XLOG_DATA_FNAME_LEN - 1)) &&
+ (strcmp(walfile + chop_at,additional_ext)==0))
+ {
+ walfile[chop_at] = '\0';
+ /*
+ * This is too chatty even for regular debug output, but
+ * leaving it in for program testing.
+ */
+ if (false)
+ fprintf(stderr,
+ "removed extension='%s' from file=%s result=%s\n",
+ additional_ext,xlde->d_name,walfile);
+ }
+ }
+
/*
* We ignore the timeline part of the XLOG segment identifiers in
* deciding whether a segment is still needed. This ensures that
*************** CleanupPriorWALFiles(void)
*** 114,123 ****
* file. Note that this means files are not removed in the order
* they were originally written, in case this worries you.
*/
! if (strlen(xlde->d_name) == XLOG_DATA_FNAME_LEN &&
! strspn(xlde->d_name, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
! strcmp(xlde->d_name + 8, exclusiveCleanupFileName + 8) < 0)
{
snprintf(WALFilePath, MAXPGPATH, "%s/%s",
archiveLocation, xlde->d_name);
--- 144,157 ----
* file. Note that this means files are not removed in the order
* they were originally written, in case this worries you.
*/
! if (strlen(walfile) == XLOG_DATA_FNAME_LEN &&
! strspn(walfile, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
! strcmp(walfile + 8, exclusiveCleanupFileName + 8) < 0)
{
+ /*
+ * Use the original file name again now, including any extension
+ * that might have been chopped off before testing the sequence.
+ */
snprintf(WALFilePath, MAXPGPATH, "%s/%s",
archiveLocation, xlde->d_name);
*************** SetWALFileNameForCleanup(void)
*** 167,172 ****
--- 201,228 ----
{
bool fnameOK = false;
+ /* if restartWALFileName was specified with an extension remove it first */
+ if (additional_ext)
+ {
+ int max_len = 0; /* Initialize just to keep quiet the compiler */
+ int chop_at = strlen(restartWALFileName) - strlen(additional_ext);
+
+ if (strlen(restartWALFileName) == (XLOG_DATA_FNAME_LEN + strlen(additional_ext)))
+ max_len = XLOG_DATA_FNAME_LEN;
+ else if (strlen(restartWALFileName) == (XLOG_BACKUP_FNAME_LEN + strlen(additional_ext) + 1))
+ max_len = XLOG_BACKUP_FNAME_LEN;
+
+ /*
+ * Only chop if this is long enough to be a file name and the
+ * extension matches.
+ */
+ if ((chop_at >= (max_len - 1)) &&
+ (strcmp(restartWALFileName + chop_at, additional_ext)==0))
+ {
+ restartWALFileName[chop_at] = '\0';
+ }
+ }
+
/*
* If restartWALFileName is a WAL file name then just use it directly. If
* restartWALFileName is a .backup filename, make sure we use the prefix
*************** usage(void)
*** 223,228 ****
--- 279,285 ----
printf("\nOptions:\n");
printf(" -d generates debug output (verbose mode)\n");
printf(" -n shows the names of the files that would have been removed (dry-run)\n");
+ printf(" -x EXT cleanup files if they have this same extension\n");
printf(" --help show this help, then exit\n");
printf(" --version output version information, then exit\n");
printf("\n"
*************** main(int argc, char **argv)
*** 259,265 ****
}
}
! while ((c = getopt(argc, argv, "dn")) != -1)
{
switch (c)
{
--- 316,322 ----
}
}
! while ((c = getopt(argc, argv, "x:dn")) != -1)
{
switch (c)
{
*************** main(int argc, char **argv)
*** 269,274 ****
--- 326,334 ----
case 'n': /* Dry-Run mode */
dryrun = true;
break;
+ case 'x':
+ additional_ext = optarg; /* Extension to remove from xlogfile names */
+ break;
default:
fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
exit(2);
diff --git a/doc/src/sgml/pgarchivecleanup.sgml b/doc/src/sgml/pgarchivecleanup.sgml
new file mode 100644
index 8148c53..40bac85
*** a/doc/src/sgml/pgarchivecleanup.sgml
--- b/doc/src/sgml/pgarchivecleanup.sgml
*************** pg_archivecleanup: removing file "archi
*** 107,112 ****
--- 107,127 ----
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-x</option> <replaceable>extension</></term>
+ <listitem>
+ <para>
+ When using the program as a standalone utility, provide an extension
+ that will be stripped from all file names before deciding if they
+ should be deleted. This is typically useful for cleaning up archives
+ that have been compressed during storage, and therefore have had an
+ extension added by the compression program. Note that the
+ <filename>.backup</> file name passed to the program should not
+ include the extension.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
On Fri, Mar 9, 2012 at 12:47 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :Dwith the patch this time
This may be a stupid idea, but it seems to me that it might be better
to dispense with all of the logic in here to detect whether the file
name is still going to be long enough after chomping the extension. I
feel like that's just making things complicated. I assume the
extensions we're thinking people will want to strip here are things
like ".gz", in which case there should be no confusion; and if
someone's dumb enough to use an extension like "0" (with no dot or
anything) but only sometimes then I think they deserve what they get
(viz: errors). See attached (only lightly tested) patch for what I'm
thinking of.
Also, I'm wondering about this warning in the documentation:
+ extension added by the compression program. Note that the
+ <filename>.backup</> file name passed to the program should not
+ include the extension.
IIUC, the latest change you made makes that warning obsolete, no?
[rhaas pgsql]$ contrib/pg_archivecleanup/pg_archivecleanup -d -x .gz .
000000010000000000000010.00000020.backup.gz
pg_archivecleanup: keep WAL file "./000000010000000000000010" and later
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
archivecleanup-extension-rmh.patchapplication/octet-stream; name=archivecleanup-extension-rmh.patchDownload
diff --git a/contrib/pg_archivecleanup/pg_archivecleanup.c b/contrib/pg_archivecleanup/pg_archivecleanup.c
index adfc83d..1b3cecf 100644
--- a/contrib/pg_archivecleanup/pg_archivecleanup.c
+++ b/contrib/pg_archivecleanup/pg_archivecleanup.c
@@ -37,6 +37,7 @@ const char *progname;
/* Options and defaults */
bool debug = false; /* are we debugging? */
bool dryrun = false; /* are we performing a dry-run operation? */
+char *additional_ext = NULL; /* Extension to remove from filenames */
char *archiveLocation; /* where to find the archive? */
char *restartWALFileName; /* the file from which we can restart restore */
@@ -91,16 +92,36 @@ Initialize(void)
}
static void
+TrimExtension(char *filename, char *extension)
+{
+ int flen;
+ int elen;
+
+ if (extension == NULL)
+ return;
+
+ elen = strlen(extension);
+ flen = strlen(filename);
+
+ if (flen > elen && strcmp(filename + flen - elen, extension) == 0)
+ filename[flen - elen] = '\0';
+}
+
+static void
CleanupPriorWALFiles(void)
{
int rc;
DIR *xldir;
struct dirent *xlde;
+ char walfile[MAXPGPATH];
if ((xldir = opendir(archiveLocation)) != NULL)
{
while ((xlde = readdir(xldir)) != NULL)
{
+ strncpy(walfile, xlde->d_name, MAXPGPATH);
+ TrimExtension(walfile, additional_ext);
+
/*
* We ignore the timeline part of the XLOG segment identifiers in
* deciding whether a segment is still needed. This ensures that
@@ -114,10 +135,14 @@ CleanupPriorWALFiles(void)
* file. Note that this means files are not removed in the order
* they were originally written, in case this worries you.
*/
- if (strlen(xlde->d_name) == XLOG_DATA_FNAME_LEN &&
- strspn(xlde->d_name, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
- strcmp(xlde->d_name + 8, exclusiveCleanupFileName + 8) < 0)
+ if (strlen(walfile) == XLOG_DATA_FNAME_LEN &&
+ strspn(walfile, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
+ strcmp(walfile + 8, exclusiveCleanupFileName + 8) < 0)
{
+ /*
+ * Use the original file name again now, including any extension
+ * that might have been chopped off before testing the sequence.
+ */
snprintf(WALFilePath, MAXPGPATH, "%s/%s",
archiveLocation, xlde->d_name);
@@ -167,6 +192,8 @@ SetWALFileNameForCleanup(void)
{
bool fnameOK = false;
+ TrimExtension(restartWALFileName, additional_ext);
+
/*
* If restartWALFileName is a WAL file name then just use it directly. If
* restartWALFileName is a .backup filename, make sure we use the prefix
@@ -223,6 +250,7 @@ usage(void)
printf("\nOptions:\n");
printf(" -d generates debug output (verbose mode)\n");
printf(" -n shows the names of the files that would have been removed (dry-run)\n");
+ printf(" -x EXT cleanup files if they have this same extension\n");
printf(" --help show this help, then exit\n");
printf(" --version output version information, then exit\n");
printf("\n"
@@ -259,7 +287,7 @@ main(int argc, char **argv)
}
}
- while ((c = getopt(argc, argv, "dn")) != -1)
+ while ((c = getopt(argc, argv, "x:dn")) != -1)
{
switch (c)
{
@@ -269,6 +297,9 @@ main(int argc, char **argv)
case 'n': /* Dry-Run mode */
dryrun = true;
break;
+ case 'x':
+ additional_ext = optarg; /* Extension to remove from xlogfile names */
+ break;
default:
fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
exit(2);
diff --git a/doc/src/sgml/pgarchivecleanup.sgml b/doc/src/sgml/pgarchivecleanup.sgml
index 8148c53..40bac85 100644
--- a/doc/src/sgml/pgarchivecleanup.sgml
+++ b/doc/src/sgml/pgarchivecleanup.sgml
@@ -107,6 +107,21 @@ pg_archivecleanup: removing file "archive/00000001000000370000000E"
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-x</option> <replaceable>extension</></term>
+ <listitem>
+ <para>
+ When using the program as a standalone utility, provide an extension
+ that will be stripped from all file names before deciding if they
+ should be deleted. This is typically useful for cleaning up archives
+ that have been compressed during storage, and therefore have had an
+ extension added by the compression program. Note that the
+ <filename>.backup</> file name passed to the program should not
+ include the extension.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
On Fri, Mar 9, 2012 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 12:47 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :Dwith the patch this time
This may be a stupid idea, but it seems to me that it might be better
to dispense with all of the logic in here to detect whether the file
name is still going to be long enough after chomping the extension. I
feel like that's just making things complicated.
while i like the idea of separating the logic, i don't like the results:
for example i tried this (notice that i forgot the point), and it just
says nothing (not even that the file with the extension but without
the point doesn't exist). that's why we were checking that the length
matches
$ ./pg_archivecleanup -x "bz2" /tmp 000000010000000100000058
$
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
On Sat, Mar 10, 2012 at 2:38 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Fri, Mar 9, 2012 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 12:47 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :Dwith the patch this time
This may be a stupid idea, but it seems to me that it might be better
to dispense with all of the logic in here to detect whether the file
name is still going to be long enough after chomping the extension. I
feel like that's just making things complicated.while i like the idea of separating the logic, i don't like the results:
for example i tried this (notice that i forgot the point), and it just
says nothing (not even that the file with the extension but without
the point doesn't exist). that's why we were checking that the length
matches$ ./pg_archivecleanup -x "bz2" /tmp 000000010000000100000058
$
i would add this in the middle of the TrimExtension() function:
if ((flen - (elen+1) != XLOG_DATA_FNAME_LEN && flen - (elen+1) !=
XLOG_BACKUP_FNAME_LEN) &&
(flen != XLOG_DATA_FNAME_LEN && flen != XLOG_BACKUP_FNAME_LEN))
{
fprintf(stderr, "%s: invalid filename input\n", progname);
fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
exit(2);
}
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
On Sat, Mar 10, 2012 at 2:38 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Fri, Mar 9, 2012 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 12:47 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :Dwith the patch this time
This may be a stupid idea, but it seems to me that it might be better
to dispense with all of the logic in here to detect whether the file
name is still going to be long enough after chomping the extension. I
feel like that's just making things complicated.while i like the idea of separating the logic, i don't like the results:
for example i tried this (notice that i forgot the point), and it just
says nothing (not even that the file with the extension but without
the point doesn't exist). that's why we were checking that the length
matches$ ./pg_archivecleanup -x "bz2" /tmp 000000010000000100000058
Hmm, but I thought that the idea was that the extension was optional.
Perhaps I'm missing something but I don't think the previous patch
will complain about that either; or at least I don't see why the
behavior should be any different.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Mar 11, 2012 at 9:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 10, 2012 at 2:38 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Fri, Mar 9, 2012 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 9, 2012 at 12:47 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
Sorry, here's the patch rebased and with the suggestion from Alex.
Which reminds me, I never thank him for the review (shame on me) :Dwith the patch this time
This may be a stupid idea, but it seems to me that it might be better
to dispense with all of the logic in here to detect whether the file
name is still going to be long enough after chomping the extension. I
feel like that's just making things complicated.while i like the idea of separating the logic, i don't like the results:
for example i tried this (notice that i forgot the point), and it just
says nothing (not even that the file with the extension but without
the point doesn't exist). that's why we were checking that the length
matches$ ./pg_archivecleanup -x "bz2" /tmp 000000010000000100000058
Hmm, but I thought that the idea was that the extension was optional.
Perhaps I'm missing something but I don't think the previous patch
will complain about that either; or at least I don't see why the
behavior should be any different.
Can someone enlighten me on this point?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Mar 28, 2012 at 8:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
$ ./pg_archivecleanup -x "bz2" /tmp 000000010000000100000058
Hmm, but I thought that the idea was that the extension was optional.
Perhaps I'm missing something but I don't think the previous patch
will complain about that either; or at least I don't see why the
behavior should be any different.Can someone enlighten me on this point?
mmm! you're right... it's not complaining either... i was sure it was...
and i'm not sure i want to contor things for that...
so, just forget my last mail about that... your refactor is just fine for me
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
On Wed, Mar 28, 2012 at 12:13 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
On Wed, Mar 28, 2012 at 8:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
$ ./pg_archivecleanup -x "bz2" /tmp 000000010000000100000058
Hmm, but I thought that the idea was that the extension was optional.
Perhaps I'm missing something but I don't think the previous patch
will complain about that either; or at least I don't see why the
behavior should be any different.Can someone enlighten me on this point?
mmm! you're right... it's not complaining either... i was sure it was...
and i'm not sure i want to contor things for that...so, just forget my last mail about that... your refactor is just fine for me
OK, committed that way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company