Refactoring of compression options in pg_basebackup
Hi all,
(Added Georgios in CC.)
When working on the support of LZ4 for pg_receivewal, walmethods.c has
gained an extra parameter for the compression method. It gets used on
the DirectoryMethod instead of the compression level to decide which
type of compression is used. One thing that I left out during this
previous work is that the TarMethod also gained knowledge of this
compression method, but we still use the compression level to check if
tars should be compressed or not.
This is wrong on multiple aspects. First, this is not consistent with
the directory method, making walmethods.c harder to figure out.
Second, this is not extensible if we want to introduce more
compression methods in pg_basebackup, like LZ4. This reflects on the
options used by pg_receivewal and pg_basebackup, that are not
inconsistent as well.
The attached patch refactors the code of pg_basebackup and the
TarMethod of walmethods.c to use the compression method where it
should, splitting entirely the logic related the compression level.
This is one step toward the introduction of LZ4 in pg_basebackup, but
this refactoring is worth doing on its own, hence a separate thread to
deal with this problem first. The options of pg_basebackup are
reworked to be consistent with pg_receivewal, as follows:
- --compress ranges now from 1 to 9, instead of 0 to 9.
- --compression-method={none,gzip} is added, the default is none, same
as HEAD.
- --gzip/-z has the same meaning as before, being just a synonym of
--compression-method=gzip with the default compression level of ZLIB
assigned if there is no --compress.
One more thing that I have noticed while hacking this stuff is that we
have no regression tests for gzip with pg_basebackup, so I have added
some that are skipped when not compiling the code with ZLIB.
Opinions?
--
Michael
Attachments:
0001-Refactor-options-of-pg_basebackup-for-compression-le.patchtext/x-diff; charset=us-asciiDownload
From 6457eae3577cd0074f58797c4fa420c296d68b39 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Sat, 18 Dec 2021 20:27:30 +0900
Subject: [PATCH] Refactor options of pg_basebackup for compression level and
methods
---
src/bin/pg_basebackup/pg_basebackup.c | 79 +++++++++++++++-----
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 47 +++++++++++-
src/bin/pg_basebackup/walmethods.c | 47 +++++++-----
doc/src/sgml/ref/pg_basebackup.sgml | 22 +++++-
4 files changed, 155 insertions(+), 40 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1739ac6382..01529aa036 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -113,6 +113,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static WalCompressionMethod compression_method = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -359,7 +360,9 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -Z, --compress=1-9 compress tar output with given compression level\n"));
+ printf(_(" --compression-method=METHOD\n"
+ " method to compress data\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -524,7 +527,7 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE, /* ignored */
+ compression_method,
compresslevel,
stream.do_sync);
@@ -1034,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
+ if (compression_method == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compression_method == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/*
* If we need to parse the archive for whatever reason, then we'll
@@ -1765,6 +1771,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"compression-method", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -1872,14 +1879,10 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
+ compression_method = COMPRESSION_GZIP;
break;
case 'Z':
- if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
+ if (!option_parse_int(optarg, "-Z/--compress", 1, 9,
&compresslevel))
exit(1);
break;
@@ -1941,6 +1944,18 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ if (pg_strcasecmp(optarg, "gzip") == 0)
+ compression_method = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(optarg, "none") == 0)
+ compression_method = COMPRESSION_NONE;
+ else
+ {
+ pg_log_error("invalid value \"%s\" for option %s",
+ optarg, "--compression-method");
+ exit(1);
+ }
+ break;
default:
/*
@@ -1978,7 +1993,7 @@ main(int argc, char **argv)
/*
* Mutually exclusive arguments
*/
- if (format == 'p' && compresslevel != 0)
+ if (format == 'p' && compression_method != COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2056,13 +2071,39 @@ main(int argc, char **argv)
}
}
-#ifndef HAVE_LIBZ
- if (compresslevel != 0)
+ /*
+ * Compression-related options.
+ */
+ switch (compression_method)
{
- pg_log_error("this build does not support compression");
- exit(1);
- }
+ case COMPRESSION_NONE:
+ if (compresslevel != 0)
+ {
+ pg_log_error("cannot use --compress with --compression-method=%s",
+ "none");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ break;
+ case COMPRESSION_GZIP:
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ {
+ pg_log_info("no value specified for --compress, switching to default");
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ }
+#else
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ break;
+ case COMPRESSION_LZ4:
+ /* option not supported */
+ Assert(false);
+ break;
+ }
if (showprogress && !estimatesize)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e56825382c..3dbda3b5a8 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 110;
+use Test::More tests => 115;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -31,6 +31,17 @@ my $pgdata = $node->data_dir;
$node->command_fails(['pg_basebackup'],
'pg_basebackup needs target directory specified');
+# Sanity checks for options
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compression-method',
+ 'none', '--compress',
+ '1'
+ ],
+ qr/\Qpg_basebackup: error: cannot use --compress with --compression-method=none/,
+ 'failure if --compress specified with --compression-method=none');
+
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -624,3 +635,37 @@ rmtree("$tempdir/backup_corrupt4");
$node->safe_psql('postgres', "DROP TABLE corrupt1;");
$node->safe_psql('postgres', "DROP TABLE corrupt2;");
+
+note "Testing pg_basebackup with compression methods";
+
+# Check ZLIB compression if available.
+SKIP:
+{
+ skip "postgres was not built with ZLIB support", 5
+ if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_gzip", '--compression-method',
+ 'gzip', '--compress', '1', '--no-sync', '--format', 't'
+ ],
+ 'pg_basebackup with --compress and --compression-method=gzip');
+
+ # Verify that the stored files are generated with their expected
+ # names.
+ my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+ is(scalar(@zlib_files), 2,
+ "two files created with gzip (base.tar.gz and pg_wal.tar.gz)");
+
+ # Check the integrity of the files generated.
+ my $gzip = $ENV{GZIP_PROGRAM};
+ skip "program gzip is not found in your system", 1
+ if ( !defined $gzip
+ || $gzip eq ''
+ || system_log($gzip, '--version') != 0);
+
+ my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
+ is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+ rmtree("$tempdir/backup_gzip");
+}
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index affdc5055f..127bb5e013 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -749,7 +749,7 @@ tar_write(Walfile f, const void *buf, size_t count)
tar_clear_error();
/* Tarfile will always be positioned at the end */
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
r = write(tar_data->fd, buf, count);
@@ -833,7 +833,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
tar_data->zp = (z_streamp) pg_malloc(sizeof(z_stream));
tar_data->zp->zalloc = Z_NULL;
@@ -884,7 +884,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
pg_free(tmppath);
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Flush existing data */
if (!tar_write_compressed_data(NULL, 0, true))
@@ -909,7 +909,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
tar_data->currentfile->currpos = 0;
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, tar_data->currentfile->header,
@@ -923,7 +923,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Write header through the zlib APIs but with no compression */
if (!tar_write_compressed_data(tar_data->currentfile->header,
@@ -938,6 +938,10 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
tar_data->currentfile->pathname = pg_strdup(pathname);
@@ -948,7 +952,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
if (pad_to_size)
{
tar_data->currentfile->pad_to_size = pad_to_size;
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
/* Uncompressed, so pad now */
if (!tar_write_padding_data(tar_data->currentfile, pad_to_size))
@@ -1009,7 +1013,7 @@ tar_sync(Walfile f)
* Always sync the whole tarfile, because that's all we can do. This makes
* no sense on compressed files, so just ignore those.
*/
- if (tar_data->compression_level)
+ if (tar_data->compression_method != COMPRESSION_NONE)
return 0;
r = fsync(tar_data->fd);
@@ -1030,7 +1034,7 @@ tar_close(Walfile f, WalCloseMethod method)
if (method == CLOSE_UNLINK)
{
- if (tar_data->compression_level)
+ if (tar_data->compression_method != COMPRESSION_NONE)
{
tar_set_error("unlink not supported with compression");
return -1;
@@ -1061,7 +1065,7 @@ tar_close(Walfile f, WalCloseMethod method)
*/
if (tf->pad_to_size)
{
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/*
* A compressed tarfile is padded on close since we cannot know
@@ -1102,7 +1106,7 @@ tar_close(Walfile f, WalCloseMethod method)
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Flush the current buffer */
if (!tar_write_compressed_data(NULL, 0, true))
@@ -1131,7 +1135,7 @@ tar_close(Walfile f, WalCloseMethod method)
tar_data->lasterrno = errno;
return -1;
}
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, tf->header, TAR_BLOCK_SIZE) != TAR_BLOCK_SIZE)
@@ -1142,7 +1146,7 @@ tar_close(Walfile f, WalCloseMethod method)
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Turn off compression */
if (deflateParams(tar_data->zp, 0, 0) != Z_OK)
@@ -1164,6 +1168,10 @@ tar_close(Walfile f, WalCloseMethod method)
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/* Move file pointer back down to end, so we can write the next file */
if (lseek(tar_data->fd, 0, SEEK_END) < 0)
@@ -1212,7 +1220,7 @@ tar_finish(void)
/* A tarfile always ends with two empty blocks */
MemSet(zerobuf, 0, sizeof(zerobuf));
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, zerobuf, sizeof(zerobuf)) != sizeof(zerobuf))
@@ -1223,7 +1231,7 @@ tar_finish(void)
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
if (!tar_write_compressed_data(zerobuf, sizeof(zerobuf), false))
return false;
@@ -1268,6 +1276,10 @@ tar_finish(void)
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/* sync the empty blocks as well, since they're after the last file */
if (tar_data->sync)
@@ -1312,7 +1324,8 @@ CreateWalTarMethod(const char *tarbase,
int compression_level, bool sync)
{
WalWriteMethod *method;
- const char *suffix = (compression_level != 0) ? ".tar.gz" : ".tar";
+ const char *suffix = (compression_method == COMPRESSION_GZIP) ?
+ ".tar.gz" : ".tar";
method = pg_malloc0(sizeof(WalWriteMethod));
method->open_for_write = tar_open_for_write;
@@ -1335,7 +1348,7 @@ CreateWalTarMethod(const char *tarbase,
tar_data->compression_level = compression_level;
tar_data->sync = sync;
#ifdef HAVE_LIBZ
- if (compression_level)
+ if (compression_method == COMPRESSION_GZIP)
tar_data->zlibOut = (char *) pg_malloc(ZLIB_OUT_SIZE + 1);
#endif
@@ -1347,7 +1360,7 @@ FreeWalTarMethod(void)
{
pg_free(tar_data->tarfilename);
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
pg_free(tar_data->zlibOut);
#endif
pg_free(tar_data);
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..9cbec844c8 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -372,9 +372,9 @@ PostgreSQL documentation
<listitem>
<para>
Enables gzip compression of tar file output, and specifies the
- compression level (0 through 9, 0 being no compression and 9 being best
- compression). Compression is only available when using the tar
- format, and the suffix <filename>.gz</filename> will
+ compression level (1 through 9, 1 being worst compression and 9
+ being best compression). Compression is only available when using
+ the tar format, and the suffix <filename>.gz</filename> will
automatically be added to all tar filenames.
</para>
</listitem>
@@ -567,6 +567,22 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--compression-method=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ Enables compression of backup data using the specified method.
+ Supported values <literal>gzip</literal> and <literal>none</literal>,
+ the default.
+ </para>
+
+ <para>
+ The suffix <filename>.gz</filename> will automatically be added to
+ all filenames when using <literal>gzip</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--manifest-force-encode</option></term>
<listitem>
--
2.34.1
On Sat, Dec 18, 2021 at 6:29 AM Michael Paquier <michael@paquier.xyz> wrote:
This is one step toward the introduction of LZ4 in pg_basebackup, but
this refactoring is worth doing on its own, hence a separate thread to
deal with this problem first. The options of pg_basebackup are
reworked to be consistent with pg_receivewal, as follows:
- --compress ranges now from 1 to 9, instead of 0 to 9.
- --compression-method={none,gzip} is added, the default is none, same
as HEAD.
- --gzip/-z has the same meaning as before, being just a synonym of
--compression-method=gzip with the default compression level of ZLIB
assigned if there is no --compress.
One thing we should keep in mind is that I'm also working on adding
server-side compression, initially with gzip, but Jeevan Ladhe has
posted patches to extend that to LZ4. So however we structure the
options they should take that into account. Currently that patch set
adds --server-compression={none,gzip,gzipN} where N from 1 to 9, but
perhaps it should be done differently. Not sure.
One more thing that I have noticed while hacking this stuff is that we
have no regression tests for gzip with pg_basebackup, so I have added
some that are skipped when not compiling the code with ZLIB.
If they don't decompress the backup and run pg_verifybackup on it then
I'm not sure how much they help. Yet, I don't know how to do that
portably.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Dec 20, 2021 at 10:19:44AM -0500, Robert Haas wrote:
One thing we should keep in mind is that I'm also working on adding
server-side compression, initially with gzip, but Jeevan Ladhe has
posted patches to extend that to LZ4. So however we structure the
options they should take that into account. Currently that patch set
adds --server-compression={none,gzip,gzipN} where N from 1 to 9, but
perhaps it should be done differently. Not sure.
Yeah, consistency would be good. For the client-side compression of
LZ4, we have shaped things around the existing --compress option, and
there is 6f164e6 that offers an API to parse that at option-level,
meaning less custom error strings. I'd like to think that splitting
the compression level and the compression method is still the right
choice, except if --server-compression combined with a client-side
compression is a legal combination. This would not really make sense,
though, no? So we'd better block this possibility from the start?
One more thing that I have noticed while hacking this stuff is that we
have no regression tests for gzip with pg_basebackup, so I have added
some that are skipped when not compiling the code with ZLIB.If they don't decompress the backup and run pg_verifybackup on it then
I'm not sure how much they help. Yet, I don't know how to do that
portably.
They help in checking that an environment does not use a buggy set of
GZIP, at least. Using pg_verifybackup on a base backup with
--format='t' could be tweaked with $ENV{TAR} for the tarballs
generation, for example, as we do in some other tests. Option sets
like "xvf" or "zxvf" should be rather portable across the buildfarm,
no? I'd like to think that this is not a requirement for adding
checks in the compression path, as a first step, though, but I agree
that it could be extended more.
--
Michael
Hi,
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, December 18th, 2021 at 12:29 PM, Michael Paquier
<michael@paquier.xyz> wrote:
Hi all,
(Added Georgios in CC.)
thank you for the shout out.
When working on the support of LZ4 for pg_receivewal, walmethods.c has
gained an extra parameter for the compression method. It gets used on
the DirectoryMethod instead of the compression level to decide which
type of compression is used. One thing that I left out during this
previous work is that the TarMethod also gained knowledge of this
compression method, but we still use the compression level to check if
tars should be compressed or not.This is wrong on multiple aspects. First, this is not consistent with
the directory method, making walmethods.c harder to figure out.
Second, this is not extensible if we want to introduce more
compression methods in pg_basebackup, like LZ4. This reflects on the
options used by pg_receivewal and pg_basebackup, that are not
inconsistent as well.
Agreed with all the above.
The attached patch refactors the code of pg_basebackup and the
TarMethod of walmethods.c to use the compression method where it
should, splitting entirely the logic related the compression level.
Thanks.
This is one step toward the introduction of LZ4 in pg_basebackup, but
this refactoring is worth doing on its own, hence a separate thread to
deal with this problem first. The options of pg_basebackup are
reworked to be consistent with pg_receivewal, as follows:
- --compress ranges now from 1 to 9, instead of 0 to 9.
- --compression-method={none,gzip} is added, the default is none, same
as HEAD.
- --gzip/-z has the same meaning as before, being just a synonym of
--compression-method=gzip with the default compression level of ZLIB
assigned if there is no --compress.
Indeed this is consistent with pg_receivewal. It gets my +1.
One more thing that I have noticed while hacking this stuff is that we
have no regression tests for gzip with pg_basebackup, so I have added
some that are skipped when not compiling the code with ZLIB.
As far as the code is concerned, I have a minor nitpick.
+ if (compression_method == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compression_method == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
The above block moves the initialization of 'streamer' within two conditional
blocks. Despite this being correct, it is possible that some compilers will
complain for lack of initialization of 'streamer' when it is eventually used a
bit further ahead in:
if (must_parse_archive)
streamer = bbstreamer_tar_archiver_new(streamer);
I propose to initialize streamer to NULL when declared at the top of
CreateBackupStreamer().
As far as the tests are concerned, I think that 2 too many tests are skipped
when HAVE_LIBZ is not defined to be 1. The patch reads:
+Check ZLIB compression if available.
+SKIP:
+{
+ skip "postgres was not built with ZLIB support", 5
+ if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_gzip", '--compression-method',
+ 'gzip', '--compress', '1', '--no-sync', '--format', 't'
+ ],
+ 'pg_basebackup with --compress and --compression-method=gzip');
+
+ # Verify that the stored files are generated with their expected
+ # names.
+ my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+ is(scalar(@zlib_files), 2,
+ "two files created with gzip (base.tar.gz and pg_wal.tar.gz)");
+
+ # Check the integrity of the files generated.
+ my $gzip = $ENV{GZIP_PROGRAM};
+ skip "program gzip is not found in your system", 1
+ if ( !defined $gzip
+ || $gzip eq ''
+ || system_log($gzip, '--version') != 0);
+
+ my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
+ is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+ rmtree("$tempdir/backup_gzip");
+}
You can see that after the check_pg_config() test, only 3 tests follow,
namely:
* $node->command_ok()
* is(scalar(), ...)
* is($gzip_is_valid, ...)
Opinions?
Other than the minor issues above, I think this is a solid improvement. +1
--
Michael
Cheers,
//Georgios
(Added Jeevan in CC, for awareness)
On Mon, Jan 03, 2022 at 03:35:57PM +0000, gkokolatos@pm.me wrote:
I propose to initialize streamer to NULL when declared at the top of
CreateBackupStreamer().
Yes, that may be noisy. Done this way.
You can see that after the check_pg_config() test, only 3 tests follow,
namely:
* $node->command_ok()
* is(scalar(), ...)
* is($gzip_is_valid, ...)
Indeed. The CF bot was complaining about that, actually.
Thinking more about this stuff, pg_basebackup --compress is an option
that exists already for a couple of years, and that's independent of
the backend-side compression that Robert and Jeevan are working on, so
I'd like to move on this code cleanup. We can always figure out the
LZ4 part for pg_basebackup after, if necessary.
Attached is an updated patch. The CF bot should improve with that.
--
Michael
Attachments:
v2-0001-Refactor-options-of-pg_basebackup-for-compression.patchtext/x-diff; charset=us-asciiDownload
From a216d5569fa6d963c2f846d9f5539d5ea7ddbd62 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 5 Jan 2022 16:58:00 +0900
Subject: [PATCH v2] Refactor options of pg_basebackup for compression level
and methods
---
src/bin/pg_basebackup/pg_basebackup.c | 81 +++++++++++++++-----
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 47 +++++++++++-
src/bin/pg_basebackup/walmethods.c | 47 ++++++++----
doc/src/sgml/ref/pg_basebackup.sgml | 22 +++++-
4 files changed, 156 insertions(+), 41 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1739ac6382..7f29200832 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -113,6 +113,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static WalCompressionMethod compression_method = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -359,7 +360,9 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -Z, --compress=1-9 compress tar output with given compression level\n"));
+ printf(_(" --compression-method=METHOD\n"
+ " method to compress data\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -524,7 +527,7 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE, /* ignored */
+ compression_method,
compresslevel,
stream.do_sync);
@@ -975,7 +978,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile)
{
- bbstreamer *streamer;
+ bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool must_parse_archive;
@@ -1034,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
+ if (compression_method == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compression_method == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/*
* If we need to parse the archive for whatever reason, then we'll
@@ -1765,6 +1771,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"compression-method", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -1872,14 +1879,10 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
+ compression_method = COMPRESSION_GZIP;
break;
case 'Z':
- if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
+ if (!option_parse_int(optarg, "-Z/--compress", 1, 9,
&compresslevel))
exit(1);
break;
@@ -1941,6 +1944,18 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ if (pg_strcasecmp(optarg, "gzip") == 0)
+ compression_method = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(optarg, "none") == 0)
+ compression_method = COMPRESSION_NONE;
+ else
+ {
+ pg_log_error("invalid value \"%s\" for option %s",
+ optarg, "--compression-method");
+ exit(1);
+ }
+ break;
default:
/*
@@ -1978,7 +1993,7 @@ main(int argc, char **argv)
/*
* Mutually exclusive arguments
*/
- if (format == 'p' && compresslevel != 0)
+ if (format == 'p' && compression_method != COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2056,13 +2071,39 @@ main(int argc, char **argv)
}
}
-#ifndef HAVE_LIBZ
- if (compresslevel != 0)
+ /*
+ * Compression-related options.
+ */
+ switch (compression_method)
{
- pg_log_error("this build does not support compression");
- exit(1);
- }
+ case COMPRESSION_NONE:
+ if (compresslevel != 0)
+ {
+ pg_log_error("cannot use --compress with --compression-method=%s",
+ "none");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ break;
+ case COMPRESSION_GZIP:
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ {
+ pg_log_info("no value specified for --compress, switching to default");
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ }
+#else
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ break;
+ case COMPRESSION_LZ4:
+ /* option not supported */
+ Assert(false);
+ break;
+ }
if (showprogress && !estimatesize)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e56825382c..e5fec5f813 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 110;
+use Test::More tests => 115;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -31,6 +31,17 @@ my $pgdata = $node->data_dir;
$node->command_fails(['pg_basebackup'],
'pg_basebackup needs target directory specified');
+# Sanity checks for options
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compression-method',
+ 'none', '--compress',
+ '1'
+ ],
+ qr/\Qpg_basebackup: error: cannot use --compress with --compression-method=none/,
+ 'failure if --compress specified with --compression-method=none');
+
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -624,3 +635,37 @@ rmtree("$tempdir/backup_corrupt4");
$node->safe_psql('postgres', "DROP TABLE corrupt1;");
$node->safe_psql('postgres', "DROP TABLE corrupt2;");
+
+note "Testing pg_basebackup with compression methods";
+
+# Check ZLIB compression if available.
+SKIP:
+{
+ skip "postgres was not built with ZLIB support", 3
+ if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_gzip", '--compression-method',
+ 'gzip', '--compress', '1', '--no-sync', '--format', 't'
+ ],
+ 'pg_basebackup with --compress and --compression-method=gzip');
+
+ # Verify that the stored files are generated with their expected
+ # names.
+ my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+ is(scalar(@zlib_files), 2,
+ "two files created with gzip (base.tar.gz and pg_wal.tar.gz)");
+
+ # Check the integrity of the files generated.
+ my $gzip = $ENV{GZIP_PROGRAM};
+ skip "program gzip is not found in your system", 1
+ if ( !defined $gzip
+ || $gzip eq ''
+ || system_log($gzip, '--version') != 0);
+
+ my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
+ is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+ rmtree("$tempdir/backup_gzip");
+}
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index affdc5055f..127bb5e013 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -749,7 +749,7 @@ tar_write(Walfile f, const void *buf, size_t count)
tar_clear_error();
/* Tarfile will always be positioned at the end */
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
r = write(tar_data->fd, buf, count);
@@ -833,7 +833,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
tar_data->zp = (z_streamp) pg_malloc(sizeof(z_stream));
tar_data->zp->zalloc = Z_NULL;
@@ -884,7 +884,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
pg_free(tmppath);
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Flush existing data */
if (!tar_write_compressed_data(NULL, 0, true))
@@ -909,7 +909,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
tar_data->currentfile->currpos = 0;
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, tar_data->currentfile->header,
@@ -923,7 +923,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Write header through the zlib APIs but with no compression */
if (!tar_write_compressed_data(tar_data->currentfile->header,
@@ -938,6 +938,10 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
tar_data->currentfile->pathname = pg_strdup(pathname);
@@ -948,7 +952,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
if (pad_to_size)
{
tar_data->currentfile->pad_to_size = pad_to_size;
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
/* Uncompressed, so pad now */
if (!tar_write_padding_data(tar_data->currentfile, pad_to_size))
@@ -1009,7 +1013,7 @@ tar_sync(Walfile f)
* Always sync the whole tarfile, because that's all we can do. This makes
* no sense on compressed files, so just ignore those.
*/
- if (tar_data->compression_level)
+ if (tar_data->compression_method != COMPRESSION_NONE)
return 0;
r = fsync(tar_data->fd);
@@ -1030,7 +1034,7 @@ tar_close(Walfile f, WalCloseMethod method)
if (method == CLOSE_UNLINK)
{
- if (tar_data->compression_level)
+ if (tar_data->compression_method != COMPRESSION_NONE)
{
tar_set_error("unlink not supported with compression");
return -1;
@@ -1061,7 +1065,7 @@ tar_close(Walfile f, WalCloseMethod method)
*/
if (tf->pad_to_size)
{
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/*
* A compressed tarfile is padded on close since we cannot know
@@ -1102,7 +1106,7 @@ tar_close(Walfile f, WalCloseMethod method)
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Flush the current buffer */
if (!tar_write_compressed_data(NULL, 0, true))
@@ -1131,7 +1135,7 @@ tar_close(Walfile f, WalCloseMethod method)
tar_data->lasterrno = errno;
return -1;
}
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, tf->header, TAR_BLOCK_SIZE) != TAR_BLOCK_SIZE)
@@ -1142,7 +1146,7 @@ tar_close(Walfile f, WalCloseMethod method)
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Turn off compression */
if (deflateParams(tar_data->zp, 0, 0) != Z_OK)
@@ -1164,6 +1168,10 @@ tar_close(Walfile f, WalCloseMethod method)
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/* Move file pointer back down to end, so we can write the next file */
if (lseek(tar_data->fd, 0, SEEK_END) < 0)
@@ -1212,7 +1220,7 @@ tar_finish(void)
/* A tarfile always ends with two empty blocks */
MemSet(zerobuf, 0, sizeof(zerobuf));
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, zerobuf, sizeof(zerobuf)) != sizeof(zerobuf))
@@ -1223,7 +1231,7 @@ tar_finish(void)
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
if (!tar_write_compressed_data(zerobuf, sizeof(zerobuf), false))
return false;
@@ -1268,6 +1276,10 @@ tar_finish(void)
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/* sync the empty blocks as well, since they're after the last file */
if (tar_data->sync)
@@ -1312,7 +1324,8 @@ CreateWalTarMethod(const char *tarbase,
int compression_level, bool sync)
{
WalWriteMethod *method;
- const char *suffix = (compression_level != 0) ? ".tar.gz" : ".tar";
+ const char *suffix = (compression_method == COMPRESSION_GZIP) ?
+ ".tar.gz" : ".tar";
method = pg_malloc0(sizeof(WalWriteMethod));
method->open_for_write = tar_open_for_write;
@@ -1335,7 +1348,7 @@ CreateWalTarMethod(const char *tarbase,
tar_data->compression_level = compression_level;
tar_data->sync = sync;
#ifdef HAVE_LIBZ
- if (compression_level)
+ if (compression_method == COMPRESSION_GZIP)
tar_data->zlibOut = (char *) pg_malloc(ZLIB_OUT_SIZE + 1);
#endif
@@ -1347,7 +1360,7 @@ FreeWalTarMethod(void)
{
pg_free(tar_data->tarfilename);
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
pg_free(tar_data->zlibOut);
#endif
pg_free(tar_data);
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..9cbec844c8 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -372,9 +372,9 @@ PostgreSQL documentation
<listitem>
<para>
Enables gzip compression of tar file output, and specifies the
- compression level (0 through 9, 0 being no compression and 9 being best
- compression). Compression is only available when using the tar
- format, and the suffix <filename>.gz</filename> will
+ compression level (1 through 9, 1 being worst compression and 9
+ being best compression). Compression is only available when using
+ the tar format, and the suffix <filename>.gz</filename> will
automatically be added to all tar filenames.
</para>
</listitem>
@@ -567,6 +567,22 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--compression-method=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ Enables compression of backup data using the specified method.
+ Supported values <literal>gzip</literal> and <literal>none</literal>,
+ the default.
+ </para>
+
+ <para>
+ The suffix <filename>.gz</filename> will automatically be added to
+ all filenames when using <literal>gzip</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--manifest-force-encode</option></term>
<listitem>
--
2.34.1
On Wednesday, January 5th, 2022 at 9:00 AM, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jan 03, 2022 at 03:35:57PM +0000, gkokolatos@pm.me wrote:
I propose to initialize streamer to NULL when declared at the top of
CreateBackupStreamer().Yes, that may be noisy. Done this way.
Great.
You can see that after the check_pg_config() test, only 3 tests follow,
namely:
* $node->command_ok()
* is(scalar(), ...)
* is($gzip_is_valid, ...)Indeed. The CF bot was complaining about that, actually.
Great.
Thinking more about this stuff, pg_basebackup --compress is an option
that exists already for a couple of years, and that's independent of
the backend-side compression that Robert and Jeevan are working on, so
I'd like to move on this code cleanup. We can always figure out the
LZ4 part for pg_basebackup after, if necessary.
I agree that the cleanup in itself is helpful. It feels awkward to have two
utilities under the same path, with distinct options for the same
functionality.
When the backend-side compression is completed, were there really be a need for
client-side compression? If yes, then it seems logical to have distinct options
for them and this cleanup makes sense. If not, then it seems logical to maintain
the current options list and 'simply' change the internals of the code, and this
cleanup makes sense.
Attached is an updated patch. The CF bot should improve with that.
+1
--
Michael
Cheers,
//Georgios
On Mon, Dec 20, 2021 at 7:43 PM Michael Paquier <michael@paquier.xyz> wrote:
Yeah, consistency would be good. For the client-side compression of
LZ4, we have shaped things around the existing --compress option, and
there is 6f164e6 that offers an API to parse that at option-level,
meaning less custom error strings. I'd like to think that splitting
the compression level and the compression method is still the right
choice, except if --server-compression combined with a client-side
compression is a legal combination. This would not really make sense,
though, no? So we'd better block this possibility from the start?
Right. It's blocked right now, but Tushar noticed on the other thread
that the error message isn't as good as it could be, so I'll improve
that a bit. Still the issue wasn't overlooked.
If they don't decompress the backup and run pg_verifybackup on it then
I'm not sure how much they help. Yet, I don't know how to do that
portably.They help in checking that an environment does not use a buggy set of
GZIP, at least. Using pg_verifybackup on a base backup with
--format='t' could be tweaked with $ENV{TAR} for the tarballs
generation, for example, as we do in some other tests. Option sets
like "xvf" or "zxvf" should be rather portable across the buildfarm,
no? I'd like to think that this is not a requirement for adding
checks in the compression path, as a first step, though, but I agree
that it could be extended more.
Oh, well, if we have a working tar available, then it's not so bad. I
was thinking we couldn't really count on that, especially on Windows.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Oh, well, if we have a working tar available, then it's not so bad. I
was thinking we couldn't really count on that, especially on Windows.
I think the existing precedent is to skip the test if tar isn't there,
cf pg_basebackup/t/010_pg_basebackup.pl. But certainly the majority of
buildfarm animals have it.
regards, tom lane
On Wed, Jan 5, 2022 at 4:17 AM <gkokolatos@pm.me> wrote:
When the backend-side compression is completed, were there really be a need for
client-side compression? If yes, then it seems logical to have distinct options
for them and this cleanup makes sense. If not, then it seems logical to maintain
the current options list and 'simply' change the internals of the code, and this
cleanup makes sense.
I think we're going to want to offer both options. We can't know
whether the user prefers to consume CPU cycles on the server or on the
client. Compressing on the server has the advantage of potentially
saving transfer bandwidth, but the server is also often the busiest
part of the whole system, and users are often keen to offload as much
work as possible.
Given that, I'd like us to be thinking about what the full set of
options looks like once we have (1) compression on either the server
or the client and (2) multiple compression algorithms and (3) multiple
compression levels. Personally, I don't really like the decision made
by this proposed patch. In this patch's view of the world, -Z is a way
of providing the compression level for whatever compression algorithm
you happen to have selected, but I think of -Z as being the upper-case
version of -z which I think of as selecting specifically gzip. It's
not particularly intuitive to me that in a command like pg_basebackup
--compress=<something>, <something> is a compression level rather than
an algorithm. So what I would propose is probably something like:
pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]
And then make -z short for --compress=gzip and -Z <n> short for
--compress=gzip --compression-level=<n>. That would be a
backward-incompatible change to the definition of --compress, but as
long as -Z <n> works the same as today, I don't think many people will
notice. If we like, we can notice if the argument to --compress is an
integer and suggest using either -Z or --compress=gzip
--compression-level=<n> instead.
In the proposed patch, you end up with pg_basebackup
--compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
find that quite odd, though as with all such things, opinions may
vary. In my proposal, that would be an error, because it would be
equivalent to --compress=lz4 --compress=gzip --compression-level=2,
and would thus involve conflicting compression method specifications.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Jan 05, 2022 at 10:33:38AM -0500, Robert Haas wrote:
I think we're going to want to offer both options. We can't know
whether the user prefers to consume CPU cycles on the server or on the
client. Compressing on the server has the advantage of potentially
saving transfer bandwidth, but the server is also often the busiest
part of the whole system, and users are often keen to offload as much
work as possible.
Yeah. There are cases for both. I just got to wonder whether it
makes sense to allow both server-side and client-side compression to
be used at the same time. That would be a rather strange case, but
well, with the correct set of options that could be possible.
Given that, I'd like us to be thinking about what the full set of
options looks like once we have (1) compression on either the server
or the client and (2) multiple compression algorithms and (3) multiple
compression levels. Personally, I don't really like the decision made
by this proposed patch. In this patch's view of the world, -Z is a way
of providing the compression level for whatever compression algorithm
you happen to have selected, but I think of -Z as being the upper-case
version of -z which I think of as selecting specifically gzip. It's
not particularly intuitive to me that in a command like pg_basebackup
--compress=<something>, <something> is a compression level rather than
an algorithm. So what I would propose is probably something like:pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]And then make -z short for --compress=gzip and -Z <n> short for
--compress=gzip --compression-level=<n>. That would be a
backward-incompatible change to the definition of --compress, but as
long as -Z <n> works the same as today, I don't think many people will
notice. If we like, we can notice if the argument to --compress is an
integer and suggest using either -Z or --compress=gzip
--compression-level=<n> instead.
My view of things is slightly different, aka I'd rather keep
--compress to mean a compression level with an integer option, but
introduce a --compression-method={lz4,gzip,none}, with -Z being a
synonym of --compression-method=gzip. That's at least the path we
chose for pg_receivewal. I don't mind sticking with one way or
another, as what you are proposing is basically the same thing I have
in mind, but both tools ought to use the same set of options.
Hmm. Perhaps at the end the problem is with --compress, where we
don't know if it means a compression level or a compression method?
For me, --compress means the former, and for you the latter. So a
third way of seeing things is to drop completely --compress, but have
one --compression-method and one --compression-level. That would
bring a clear split. Or just one --compression-method for the
client-side compression as you are proposing for the server-side
compression, however I'd like to think that a split between the method
and level is more intuitive.
In the proposed patch, you end up with pg_basebackup
--compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
find that quite odd, though as with all such things, opinions may
vary. In my proposal, that would be an error, because it would be
equivalent to --compress=lz4 --compress=gzip --compression-level=2,
and would thus involve conflicting compression method specifications.
It seems to me that you did not read the patch closely enough. The
attached patch does not add support for LZ4 in pg_basebackup on the
client-side yet. Once it gets added, though, the idea is that using
--compress with LZ4 would result in an error. That's what happens
with pg_receivewal on HEAD, for one. The patch just shapes things to
plug LZ4 more easily in the existing code of pg_basebackup.c, and
walmethods.c.
So.. As of now, it is actually possible to cut the pie in three
parts. There are no real objections to the cleanup of walmethods.c
and the addition of some conditional TAP tests with pg_basebackup and
client-side compression, as far as I can see, only to the option
renaming part. Attached are two patches, then. 0001 is the cleanup
of walmethods.c to rely the compression method, with more tests (tests
that could be moved into their own patch, as well). 0002 is the
addition of the options I suggested upthread, but we may change that
depending on what gets used for the server-side compression for
consistency so I am not suggesting to merge that until we agree on the
full picture. The point of this thread was mostly about 0001, so I am
fine to discard 0002. Thoughts?
--
Michael
Attachments:
v3-0001-Refactor-walmethods.c-s-tar-method-to-use-compres.patchtext/x-diff; charset=us-asciiDownload
From b2beb69c23ea8346a684e8c7ae5a5b60bade2066 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 6 Jan 2022 13:46:56 +0900
Subject: [PATCH v3 1/2] Refactor walmethods.c's tar method to use compression
method
Some TAP tests are added for pg_basebackup --compress, while on it.
---
src/bin/pg_basebackup/pg_basebackup.c | 3 +-
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 36 ++++++++++++++-
src/bin/pg_basebackup/walmethods.c | 47 +++++++++++++-------
3 files changed, 67 insertions(+), 19 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1739ac6382..17ff0132d9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -524,7 +524,8 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE, /* ignored */
+ (compresslevel > 0) ?
+ COMPRESSION_GZIP : COMPRESSION_NONE,
compresslevel,
stream.do_sync);
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e56825382c..192781e0a2 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 110;
+use Test::More tests => 113;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -624,3 +624,37 @@ rmtree("$tempdir/backup_corrupt4");
$node->safe_psql('postgres', "DROP TABLE corrupt1;");
$node->safe_psql('postgres', "DROP TABLE corrupt2;");
+
+note "Testing pg_basebackup with compression methods";
+
+# Check ZLIB compression if available.
+SKIP:
+{
+ skip "postgres was not built with ZLIB support", 3
+ if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_gzip", '--compress', '1',
+ '--no-sync', '--format', 't'
+ ],
+ 'pg_basebackup with --compress');
+
+ # Verify that the stored files are generated with their expected
+ # names.
+ my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+ is(scalar(@zlib_files), 2,
+ "two files created with gzip (base.tar.gz and pg_wal.tar.gz)");
+
+ # Check the integrity of the files generated.
+ my $gzip = $ENV{GZIP_PROGRAM};
+ skip "program gzip is not found in your system", 1
+ if ( !defined $gzip
+ || $gzip eq ''
+ || system_log($gzip, '--version') != 0);
+
+ my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
+ is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+ rmtree("$tempdir/backup_gzip");
+}
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index affdc5055f..127bb5e013 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -749,7 +749,7 @@ tar_write(Walfile f, const void *buf, size_t count)
tar_clear_error();
/* Tarfile will always be positioned at the end */
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
r = write(tar_data->fd, buf, count);
@@ -833,7 +833,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
tar_data->zp = (z_streamp) pg_malloc(sizeof(z_stream));
tar_data->zp->zalloc = Z_NULL;
@@ -884,7 +884,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
pg_free(tmppath);
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Flush existing data */
if (!tar_write_compressed_data(NULL, 0, true))
@@ -909,7 +909,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
tar_data->currentfile->currpos = 0;
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, tar_data->currentfile->header,
@@ -923,7 +923,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Write header through the zlib APIs but with no compression */
if (!tar_write_compressed_data(tar_data->currentfile->header,
@@ -938,6 +938,10 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
tar_data->currentfile->pathname = pg_strdup(pathname);
@@ -948,7 +952,7 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
if (pad_to_size)
{
tar_data->currentfile->pad_to_size = pad_to_size;
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
/* Uncompressed, so pad now */
if (!tar_write_padding_data(tar_data->currentfile, pad_to_size))
@@ -1009,7 +1013,7 @@ tar_sync(Walfile f)
* Always sync the whole tarfile, because that's all we can do. This makes
* no sense on compressed files, so just ignore those.
*/
- if (tar_data->compression_level)
+ if (tar_data->compression_method != COMPRESSION_NONE)
return 0;
r = fsync(tar_data->fd);
@@ -1030,7 +1034,7 @@ tar_close(Walfile f, WalCloseMethod method)
if (method == CLOSE_UNLINK)
{
- if (tar_data->compression_level)
+ if (tar_data->compression_method != COMPRESSION_NONE)
{
tar_set_error("unlink not supported with compression");
return -1;
@@ -1061,7 +1065,7 @@ tar_close(Walfile f, WalCloseMethod method)
*/
if (tf->pad_to_size)
{
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/*
* A compressed tarfile is padded on close since we cannot know
@@ -1102,7 +1106,7 @@ tar_close(Walfile f, WalCloseMethod method)
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Flush the current buffer */
if (!tar_write_compressed_data(NULL, 0, true))
@@ -1131,7 +1135,7 @@ tar_close(Walfile f, WalCloseMethod method)
tar_data->lasterrno = errno;
return -1;
}
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, tf->header, TAR_BLOCK_SIZE) != TAR_BLOCK_SIZE)
@@ -1142,7 +1146,7 @@ tar_close(Walfile f, WalCloseMethod method)
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
/* Turn off compression */
if (deflateParams(tar_data->zp, 0, 0) != Z_OK)
@@ -1164,6 +1168,10 @@ tar_close(Walfile f, WalCloseMethod method)
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/* Move file pointer back down to end, so we can write the next file */
if (lseek(tar_data->fd, 0, SEEK_END) < 0)
@@ -1212,7 +1220,7 @@ tar_finish(void)
/* A tarfile always ends with two empty blocks */
MemSet(zerobuf, 0, sizeof(zerobuf));
- if (!tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_NONE)
{
errno = 0;
if (write(tar_data->fd, zerobuf, sizeof(zerobuf)) != sizeof(zerobuf))
@@ -1223,7 +1231,7 @@ tar_finish(void)
}
}
#ifdef HAVE_LIBZ
- else
+ else if (tar_data->compression_method == COMPRESSION_GZIP)
{
if (!tar_write_compressed_data(zerobuf, sizeof(zerobuf), false))
return false;
@@ -1268,6 +1276,10 @@ tar_finish(void)
}
}
#endif
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/* sync the empty blocks as well, since they're after the last file */
if (tar_data->sync)
@@ -1312,7 +1324,8 @@ CreateWalTarMethod(const char *tarbase,
int compression_level, bool sync)
{
WalWriteMethod *method;
- const char *suffix = (compression_level != 0) ? ".tar.gz" : ".tar";
+ const char *suffix = (compression_method == COMPRESSION_GZIP) ?
+ ".tar.gz" : ".tar";
method = pg_malloc0(sizeof(WalWriteMethod));
method->open_for_write = tar_open_for_write;
@@ -1335,7 +1348,7 @@ CreateWalTarMethod(const char *tarbase,
tar_data->compression_level = compression_level;
tar_data->sync = sync;
#ifdef HAVE_LIBZ
- if (compression_level)
+ if (compression_method == COMPRESSION_GZIP)
tar_data->zlibOut = (char *) pg_malloc(ZLIB_OUT_SIZE + 1);
#endif
@@ -1347,7 +1360,7 @@ FreeWalTarMethod(void)
{
pg_free(tar_data->tarfilename);
#ifdef HAVE_LIBZ
- if (tar_data->compression_level)
+ if (tar_data->compression_method == COMPRESSION_GZIP)
pg_free(tar_data->zlibOut);
#endif
pg_free(tar_data);
--
2.34.1
v3-0002-Rework-the-option-set-of-pg_basebackup.patchtext/x-diff; charset=us-asciiDownload
From c6518d28cbc9e18f5445163a756692333529519d Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 6 Jan 2022 13:53:46 +0900
Subject: [PATCH v3 2/2] Rework the option set of pg_basebackup
---
src/bin/pg_basebackup/pg_basebackup.c | 82 +++++++++++++++-----
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 19 ++++-
doc/src/sgml/ref/pg_basebackup.sgml | 22 +++++-
3 files changed, 95 insertions(+), 28 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 17ff0132d9..7f29200832 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -113,6 +113,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static WalCompressionMethod compression_method = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -359,7 +360,9 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -Z, --compress=1-9 compress tar output with given compression level\n"));
+ printf(_(" --compression-method=METHOD\n"
+ " method to compress data\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -524,8 +527,7 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- (compresslevel > 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE,
+ compression_method,
compresslevel,
stream.do_sync);
@@ -976,7 +978,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile)
{
- bbstreamer *streamer;
+ bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool must_parse_archive;
@@ -1035,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
+ if (compression_method == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compression_method == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/*
* If we need to parse the archive for whatever reason, then we'll
@@ -1766,6 +1771,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"compression-method", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -1873,14 +1879,10 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
+ compression_method = COMPRESSION_GZIP;
break;
case 'Z':
- if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
+ if (!option_parse_int(optarg, "-Z/--compress", 1, 9,
&compresslevel))
exit(1);
break;
@@ -1942,6 +1944,18 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ if (pg_strcasecmp(optarg, "gzip") == 0)
+ compression_method = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(optarg, "none") == 0)
+ compression_method = COMPRESSION_NONE;
+ else
+ {
+ pg_log_error("invalid value \"%s\" for option %s",
+ optarg, "--compression-method");
+ exit(1);
+ }
+ break;
default:
/*
@@ -1979,7 +1993,7 @@ main(int argc, char **argv)
/*
* Mutually exclusive arguments
*/
- if (format == 'p' && compresslevel != 0)
+ if (format == 'p' && compression_method != COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2057,13 +2071,39 @@ main(int argc, char **argv)
}
}
-#ifndef HAVE_LIBZ
- if (compresslevel != 0)
+ /*
+ * Compression-related options.
+ */
+ switch (compression_method)
{
- pg_log_error("this build does not support compression");
- exit(1);
- }
+ case COMPRESSION_NONE:
+ if (compresslevel != 0)
+ {
+ pg_log_error("cannot use --compress with --compression-method=%s",
+ "none");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ break;
+ case COMPRESSION_GZIP:
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ {
+ pg_log_info("no value specified for --compress, switching to default");
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ }
+#else
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ break;
+ case COMPRESSION_LZ4:
+ /* option not supported */
+ Assert(false);
+ break;
+ }
if (showprogress && !estimatesize)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 192781e0a2..e5fec5f813 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 113;
+use Test::More tests => 115;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -31,6 +31,17 @@ my $pgdata = $node->data_dir;
$node->command_fails(['pg_basebackup'],
'pg_basebackup needs target directory specified');
+# Sanity checks for options
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compression-method',
+ 'none', '--compress',
+ '1'
+ ],
+ qr/\Qpg_basebackup: error: cannot use --compress with --compression-method=none/,
+ 'failure if --compress specified with --compression-method=none');
+
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -636,10 +647,10 @@ SKIP:
$node->command_ok(
[
'pg_basebackup', '-D',
- "$tempdir/backup_gzip", '--compress', '1',
- '--no-sync', '--format', 't'
+ "$tempdir/backup_gzip", '--compression-method',
+ 'gzip', '--compress', '1', '--no-sync', '--format', 't'
],
- 'pg_basebackup with --compress');
+ 'pg_basebackup with --compress and --compression-method=gzip');
# Verify that the stored files are generated with their expected
# names.
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..9cbec844c8 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -372,9 +372,9 @@ PostgreSQL documentation
<listitem>
<para>
Enables gzip compression of tar file output, and specifies the
- compression level (0 through 9, 0 being no compression and 9 being best
- compression). Compression is only available when using the tar
- format, and the suffix <filename>.gz</filename> will
+ compression level (1 through 9, 1 being worst compression and 9
+ being best compression). Compression is only available when using
+ the tar format, and the suffix <filename>.gz</filename> will
automatically be added to all tar filenames.
</para>
</listitem>
@@ -567,6 +567,22 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--compression-method=<replaceable class="parameter">method</replaceable></option></term>
+ <listitem>
+ <para>
+ Enables compression of backup data using the specified method.
+ Supported values <literal>gzip</literal> and <literal>none</literal>,
+ the default.
+ </para>
+
+ <para>
+ The suffix <filename>.gz</filename> will automatically be added to
+ all filenames when using <literal>gzip</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--manifest-force-encode</option></term>
<listitem>
--
2.34.1
On Wed, Jan 05, 2022 at 10:22:06AM -0500, Tom Lane wrote:
I think the existing precedent is to skip the test if tar isn't there,
cf pg_basebackup/t/010_pg_basebackup.pl. But certainly the majority of
buildfarm animals have it.
Even Windows environments should be fine, aka recent edc2332.
--
Michael
On Thu, Jan 6, 2022 at 12:04 AM Michael Paquier <michael@paquier.xyz> wrote:
Yeah. There are cases for both. I just got to wonder whether it
makes sense to allow both server-side and client-side compression to
be used at the same time. That would be a rather strange case, but
well, with the correct set of options that could be possible.
I don't think it makes sense to support that. On the one hand, it
doesn't seem useful: compressing already-compressed data doesn't
usually work very well. Alternatively, I suppose the intent could be
to compress one way for transfer and then decompress and recompress
for storage, but that seems too inefficient to take seriously. On the
other hand, it requires a more complex user interface, and it's
already fairly complicated anyway.
My view of things is slightly different, aka I'd rather keep
--compress to mean a compression level with an integer option, but
introduce a --compression-method={lz4,gzip,none}, with -Z being a
synonym of --compression-method=gzip. That's at least the path we
chose for pg_receivewal. I don't mind sticking with one way or
another, as what you are proposing is basically the same thing I have
in mind, but both tools ought to use the same set of options.
Did you mean that -z would be a synonym for --compression-method=gzip?
It doesn't really make sense for -Z to be that, unless it's also
setting the compression level.
My objection to --compress=$LEVEL is that the compression level seems
like it ought to rightfully be subordinate to the choice of algorithm.
In general, there's no reason why a compression algorithm has to offer
a choice of compression levels at all, or why they have to be numbered
0 through 9. For example, lz4 on my system claims to offer compression
levels from 1 through 12, plus a separate set of "fast" compression
levels starting with 1 and going up to an unspecified number. And then
it also has options to favor decompression speed, change the block
size, and a few other parameters. We don't necessarily want to expose
all of those options, but we should structure things so that we could
if it became important. The way to do that is to make the compression
algorithm the primary setting, and then anything else you can set for
that compressor is somehow a subordinate setting.
Put another way, we don't decide first that we want to compress with
level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
We pick the compressor first, and then MAYBE think about changing the
compression level.
In the proposed patch, you end up with pg_basebackup
--compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
find that quite odd, though as with all such things, opinions may
vary. In my proposal, that would be an error, because it would be
equivalent to --compress=lz4 --compress=gzip --compression-level=2,
and would thus involve conflicting compression method specifications.It seems to me that you did not read the patch closely enough. The
attached patch does not add support for LZ4 in pg_basebackup on the
client-side yet. Once it gets added, though, the idea is that using
--compress with LZ4 would result in an error. That's what happens
with pg_receivewal on HEAD, for one. The patch just shapes things to
plug LZ4 more easily in the existing code of pg_basebackup.c, and
walmethods.c.
Well what I was looking at was this:
- printf(_(" -Z, --compress=0-9 compress tar output with given
compression level\n"));
+ printf(_(" -Z, --compress=1-9 compress tar output with given
compression level\n"));
+ printf(_(" --compression-method=METHOD\n"
+ " method to compress data\n"));
That seems to show that, post-patch, the argument to -Z would be a
compression level, even if --compression-method were something other
than gzip.
It's possible that I haven't read something carefully enough, but to
me, what I said seems to be a straightforward conclusion based on
looking at the usage help in the patch. So if I came to the wrong
conclusion, perhaps that usage help isn't reflecting the situation you
intend to create, or not as clearly as it ought.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Jan 06, 2022 at 09:27:19AM -0500, Robert Haas wrote:
Did you mean that -z would be a synonym for --compression-method=gzip?
It doesn't really make sense for -Z to be that, unless it's also
setting the compression level.
Yes, I meant "-z", not "-Z", to be a synonym of
--compression-method=gzip. Sorry for the typo.
My objection to --compress=$LEVEL is that the compression level seems
like it ought to rightfully be subordinate to the choice of algorithm.
In general, there's no reason why a compression algorithm has to offer
a choice of compression levels at all, or why they have to be numbered
0 through 9. For example, lz4 on my system claims to offer compression
levels from 1 through 12, plus a separate set of "fast" compression
levels starting with 1 and going up to an unspecified number. And then
it also has options to favor decompression speed, change the block
size, and a few other parameters. We don't necessarily want to expose
all of those options, but we should structure things so that we could
if it became important. The way to do that is to make the compression
algorithm the primary setting, and then anything else you can set for
that compressor is somehow a subordinate setting.
For any compression method, that maps to an integer, so.. But I am
not going to fight hard on that.
Put another way, we don't decide first that we want to compress with
level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
We pick the compressor first, and then MAYBE think about changing the
compression level.
Which is why things should be checked once all the options are
processed. I'd recommend that you read the option patch a bit more,
that may help.
Well what I was looking at was this:
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n")); + printf(_(" -Z, --compress=1-9 compress tar output with given compression level\n")); + printf(_(" --compression-method=METHOD\n" + " method to compress data\n"));That seems to show that, post-patch, the argument to -Z would be a
compression level, even if --compression-method were something other
than gzip.
Yes, after the patch --compress would be a compression level. And, if
attempting to use with --compression-method set to "none", or
potentially "lz4", it would just fail. If not using this "gzip", the
compression level is switched to Z_DEFAULT_COMPRESSION. That's this
area of the patch, FWIW:
+ /*
+ * Compression-related options.
+ */
+ switch (compression_method)
It's possible that I haven't read something carefully enough, but to
me, what I said seems to be a straightforward conclusion based on
looking at the usage help in the patch. So if I came to the wrong
conclusion, perhaps that usage help isn't reflecting the situation you
intend to create, or not as clearly as it ought.
Perhaps the --help output could be clearer, then. Do you have a
suggestion?
Bringing walmethods.c at the same page for the directory and the tar
methods was my primary goal here, and the tests are a bonus, so I've
applied this part for now, leaving pg_basebackup alone until we figure
out the layer of options we should use. Perhaps it would be better to
revisit that stuff once the server-side compression has landed.
--
Michael
On Fri, Jan 7, 2022 at 1:43 AM Michael Paquier <michael@paquier.xyz> wrote:
Which is why things should be checked once all the options are
processed. I'd recommend that you read the option patch a bit more,
that may help.
I don't think that the problem here is my lack of understanding. I
have two basic concerns about your proposed patch:
1. If, as you propose, we add a new flag --compression-method=METHOD
then how will the user specify server-side compression?
2. If, as we seem to agree, the compression method is more important
than the compression level, then why is the option to set the
less-important thing called just --compress, and the option to set the
more important thing has a longer name?
I proposed to solve both of these problems by using
--compression-level=NUMBER to set the compression level and
--compress=METHOD or --server-compress=METHOD to set the algorithm and
specify on which side it is to be applied. If, instead of doing that,
we go with what you have proposed here, then I don't understand how to
fit server-side compression into the framework in a reasonably concise
way. I think we would end up with something like pg_basebackup
--compression-method=lz4 --compress-on-server, which seems rather long
and awkward. Do you have a better idea?
I think I understand what the patch is doing. I just think it creates
a problem for my patch. And I'd like to know whether you have an idea
how to solve that problem. And if not, then I'd like you to consider
the solution that I am proposing rather than the patch you've already
got.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Jan 13, 2022 at 01:36:00PM -0500, Robert Haas wrote:
1. If, as you propose, we add a new flag --compression-method=METHOD
then how will the user specify server-side compression?
This would require a completely different option switch, which is
basically the same thing as what you are suggesting with
--server-compress.
2. If, as we seem to agree, the compression method is more important
than the compression level, then why is the option to set the
less-important thing called just --compress, and the option to set the
more important thing has a longer name?
I agree that the method is more important than the level for most
users, and I would not mind dropping completely --compress in favor of
something else, which is something I implied upthread.
I proposed to solve both of these problems by using
--compression-level=NUMBER to set the compression level and
--compress=METHOD or --server-compress=METHOD to set the algorithm and
specify on which side it is to be applied. If, instead of doing that,
we go with what you have proposed here, then I don't understand how to
fit server-side compression into the framework in a reasonably concise
way. I think we would end up with something like pg_basebackup
--compression-method=lz4 --compress-on-server, which seems rather long
and awkward. Do you have a better idea?
Using --compression-level=NUMBER and --server-compress=METHOD to
specify a server-side compression method with a level is fine by me,
but I find the reuse of --compress to specify a compression method
confusing as it maps with the past option we have kept in
pg_basebackup for a couple of years now. Based on your suggested set
of options, we could then have a --client-compress=METHOD and
--compression-level=NUMBER to specify a client-side compression method
with a level. If we do that, I guess that we should then:
1) Block the combination of --server-compress and --client-compress.
2) Remove the existing -Z/--compress and -z/--gzip.
You have implied 1) upthread as far as I recall, 2) is something I am
adding on top of it.
I think I understand what the patch is doing. I just think it creates
a problem for my patch. And I'd like to know whether you have an idea
how to solve that problem. And if not, then I'd like you to consider
the solution that I am proposing rather than the patch you've already
got.
I am fine to drop this thread's patch with its set of options and work
on top of your proposal, aka what's drafted two paragraphs above.
--
Michael
On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier <michael@paquier.xyz> wrote:
Using --compression-level=NUMBER and --server-compress=METHOD to
specify a server-side compression method with a level is fine by me,
but I find the reuse of --compress to specify a compression method
confusing as it maps with the past option we have kept in
pg_basebackup for a couple of years now. Based on your suggested set
of options, we could then have a --client-compress=METHOD and
--compression-level=NUMBER to specify a client-side compression method
with a level. If we do that, I guess that we should then:
1) Block the combination of --server-compress and --client-compress.
2) Remove the existing -Z/--compress and -z/--gzip.
I could live with that. I'm not sure that --client-compress instead of
reusing --compress is going to be better ... but I don't think it's
awful so much as just not my first choice. I also don't think it would
be horrid to leave -z, --gzip, and -Z as shorthands for the
--client-compress=gzip with --compression-level also in the last case,
instead of removing all that stuff.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Jan 14, 2022 at 04:53:12PM -0500, Robert Haas wrote:
On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier <michael@paquier.xyz> wrote:
Using --compression-level=NUMBER and --server-compress=METHOD to
specify a server-side compression method with a level is fine by me,
but I find the reuse of --compress to specify a compression method
confusing as it maps with the past option we have kept in
pg_basebackup for a couple of years now. Based on your suggested set
of options, we could then have a --client-compress=METHOD and
--compression-level=NUMBER to specify a client-side compression method
with a level. If we do that, I guess that we should then:
1) Block the combination of --server-compress and --client-compress.
2) Remove the existing -Z/--compress and -z/--gzip.I could live with that. I'm not sure that --client-compress instead of
reusing --compress is going to be better ... but I don't think it's
awful so much as just not my first choice. I also don't think it would
be horrid to leave -z, --gzip, and -Z as shorthands for the
--client-compress=gzip with --compression-level also in the last case,
instead of removing all that stuff.
Okay. So, based on this feedback, I guess that something like the
attached would be what we are looking for. I have maximized the
amount of code removed with the removal of -z/-Z, but I won't fight
hard if the consensus is to keep them, either. We could also keep
-z/--gzip, and stick -Z to the new --compression-level with
--compress removed.
--
Michael
Attachments:
v4-0001-Rework-the-option-set-of-pg_basebackup.patchtext/x-diff; charset=us-asciiDownload
From 1bbe624a4bea5e00a1997f81fe3b1ec1f637fa44 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Sat, 15 Jan 2022 11:51:40 +0900
Subject: [PATCH v4] Rework the option set of pg_basebackup
---
src/bin/pg_basebackup/pg_basebackup.c | 99 ++++++++++++++------
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 37 ++++----
doc/src/sgml/ref/pg_basebackup.sgml | 33 ++++---
3 files changed, 105 insertions(+), 64 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index aa43fc0924..57b054a8ad 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -113,6 +113,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static WalCompressionMethod client_method = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -358,8 +359,10 @@ usage(void)
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
- printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" --client-compress=METHOD\n"
+ " method to compress data (client-side)\n"));
+ printf(_(" --compression-level=1-9\n"
+ " compress tar output with given compression level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -524,8 +527,7 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- (compresslevel != 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE,
+ client_method,
compresslevel,
stream.do_sync);
@@ -976,7 +978,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile)
{
- bbstreamer *streamer;
+ bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool must_parse_archive;
@@ -1035,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
+ if (client_method == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (client_method == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/*
* If we need to parse the archive for whatever reason, then we'll
@@ -1745,8 +1750,6 @@ main(int argc, char **argv)
{"slot", required_argument, NULL, 'S'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
- {"gzip", no_argument, NULL, 'z'},
- {"compress", required_argument, NULL, 'Z'},
{"label", required_argument, NULL, 'l'},
{"no-clean", no_argument, NULL, 'n'},
{"no-sync", no_argument, NULL, 'N'},
@@ -1766,6 +1769,8 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"client-compress", required_argument, NULL, 8},
+ {"compression-level", required_argument, NULL, 9},
{NULL, 0, NULL, 0}
};
int c;
@@ -1793,7 +1798,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNd:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -1872,18 +1877,6 @@ main(int argc, char **argv)
case 'N':
do_sync = false;
break;
- case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
- break;
- case 'Z':
- if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
- &compresslevel))
- exit(1);
- break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
fastcheckpoint = true;
@@ -1942,6 +1935,24 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ if (pg_strcasecmp(optarg, "gzip") == 0)
+ client_method = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(optarg, "none") == 0)
+ client_method = COMPRESSION_NONE;
+ else
+ {
+ pg_log_error("invalid value \"%s\" for option %s",
+ optarg, "--client-compress");
+ exit(1);
+ }
+ break;
+ case 9:
+ if (!option_parse_int(optarg, "--compression-level", 1, 9,
+ &compresslevel))
+ exit(1);
+ break;
+ break;
default:
/*
@@ -1979,7 +1990,7 @@ main(int argc, char **argv)
/*
* Mutually exclusive arguments
*/
- if (format == 'p' && compresslevel != 0)
+ if (format == 'p' && client_method != COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2057,13 +2068,39 @@ main(int argc, char **argv)
}
}
-#ifndef HAVE_LIBZ
- if (compresslevel != 0)
+ /*
+ * Compression-related options.
+ */
+ switch (client_method)
{
- pg_log_error("this build does not support compression");
- exit(1);
- }
+ case COMPRESSION_NONE:
+ if (compresslevel != 0)
+ {
+ pg_log_error("cannot use --compression-level with --client-compress=%s",
+ "none");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ break;
+ case COMPRESSION_GZIP:
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ {
+ pg_log_info("no value specified for --compression-level, switching to default");
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ }
+#else
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ break;
+ case COMPRESSION_LZ4:
+ /* option not supported */
+ Assert(false);
+ break;
+ }
if (showprogress && !estimatesize)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 872fec3d35..0f84b3952a 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -31,6 +31,17 @@ my $pgdata = $node->data_dir;
$node->command_fails(['pg_basebackup'],
'pg_basebackup needs target directory specified');
+# Sanity checks for options
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--client-compress',
+ 'none', '--compression-level',
+ '1'
+ ],
+ qr/\Qpg_basebackup: error: cannot use --compression-level with --client-compress=none/,
+ 'failure if --compression-level specified with --client-compress=none');
+
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -630,23 +641,14 @@ note "Testing pg_basebackup with compression methods";
# Check ZLIB compression if available.
SKIP:
{
- skip "postgres was not built with ZLIB support", 5
+ skip "postgres was not built with ZLIB support", 3
if (!check_pg_config("#define HAVE_LIBZ 1"));
$node->command_ok(
[
'pg_basebackup', '-D',
- "$tempdir/backup_gzip", '--compress',
- '1', '--no-sync',
- '--format', 't'
- ],
- 'pg_basebackup with --compress');
- $node->command_ok(
- [
- 'pg_basebackup', '-D',
- "$tempdir/backup_gzip2", '--gzip',
- '--no-sync', '--format',
- 't'
+ "$tempdir/backup_gzip", '--client-compress',
+ 'gzip', '--no-sync', '--format', 't'
],
'pg_basebackup with --gzip');
@@ -654,11 +656,8 @@ SKIP:
# names.
my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
is(scalar(@zlib_files), 2,
- "two files created with --compress (base.tar.gz and pg_wal.tar.gz)");
- my @zlib_files2 = glob "$tempdir/backup_gzip2/*.tar.gz";
- is(scalar(@zlib_files2), 2,
- "two files created with --gzip (base.tar.gz and pg_wal.tar.gz)");
-
+ "two files created with --client-compress=gzip (base.tar.gz and pg_wal.tar.gz)"
+ );
# Check the integrity of the files generated.
my $gzip = $ENV{GZIP_PROGRAM};
skip "program gzip is not found in your system", 1
@@ -666,9 +665,7 @@ SKIP:
|| $gzip eq ''
|| system_log($gzip, '--version') != 0);
- my $gzip_is_valid =
- system_log($gzip, '--test', @zlib_files, @zlib_files2);
+ my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
rmtree("$tempdir/backup_gzip");
- rmtree("$tempdir/backup_gzip2");
}
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..1a859794b2 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -354,28 +354,35 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
- <term><option>-z</option></term>
- <term><option>--gzip</option></term>
+ <term><option>--compression-level=<replaceable class="parameter">level</replaceable></option></term>
<listitem>
<para>
- Enables gzip compression of tar file output, with the default
- compression level. Compression is only available when using
- the tar format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ Controls the level of compression used with the tar file output,
+ and specifies the compression level (1 through 9, 1 being worst
+ compression and 9 being best compression). Compression is only
+ available when using the tar format, and the suffix
+ <filename>.gz</filename> will automatically be added to all tar
+ filenames.
+ </para>
+ <para>
+ This option is only available with <literal>--client-compress</literal>
+ set to <literal>gzip</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
+ <term><option>--client-compress=<replaceable class="parameter">method</replaceable></option></term>
<listitem>
<para>
- Enables gzip compression of tar file output, and specifies the
- compression level (0 through 9, 0 being no compression and 9 being best
- compression). Compression is only available when using the tar
- format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ Enables client-side compression of backup data using the specified
+ method. Supported values are <literal>gzip</literal> and
+ <literal>none</literal>, the default.
+ </para>
+
+ <para>
+ The suffix <filename>.gz</filename> will automatically be added to
+ all filenames when using <literal>gzip</literal>.
</para>
</listitem>
</varlistentry>
--
2.34.1
On Fri, Jan 14, 2022 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier <michael@paquier.xyz> wrote:
Using --compression-level=NUMBER and --server-compress=METHOD to
specify a server-side compression method with a level is fine by me,
but I find the reuse of --compress to specify a compression method
confusing as it maps with the past option we have kept in
pg_basebackup for a couple of years now. Based on your suggested set
of options, we could then have a --client-compress=METHOD and
--compression-level=NUMBER to specify a client-side compression method
with a level. If we do that, I guess that we should then:
1) Block the combination of --server-compress and --client-compress.
2) Remove the existing -Z/--compress and -z/--gzip.I could live with that. I'm not sure that --client-compress instead of
reusing --compress is going to be better ... but I don't think it's
awful so much as just not my first choice. I also don't think it would
be horrid to leave -z, --gzip, and -Z as shorthands for the
--client-compress=gzip with --compression-level also in the last case,
instead of removing all that stuff.
It never makes sense to compress *both* in server and client, right?
One argument in that case for using --compress would be that we could
have that one take options like --compress=gzip (use gzip in the
client) and --compress=server-lz4 (use lz4 on the server), and
automatically make it impossible to do both. And maybe also accept
--compress=client-gzip (which would be the same as just specifying
gzip).
That would be an argument for actually keeping --compress and not
using --client-compress, because obviously it would be silly to have
--client-compress=server-lz4...
And yes, I agree that considering both server and client compression
even if we don't have server compression makes sense, since we don't
want to change things around again when we get it.
We could perhaps also consider accepting --compress=gzip:7
(<method>:<level>) as a way to specify the level, for both client and
server side.
I think having --client-compress and --server-compress separately but
having --compression-level *not* being separate would be confusing and
I *think* that's what the current patch proposes?
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Sat, Jan 15, 2022 at 04:15:26PM +0100, Magnus Hagander wrote:
I think having --client-compress and --server-compress separately but
having --compression-level *not* being separate would be confusing and
I *think* that's what the current patch proposes?
Yep, your understanding is right. The last version of the patch
posted does exactly that.
--
Michael
On Fri, Jan 14, 2022 at 9:54 PM Michael Paquier <michael@paquier.xyz> wrote:
Okay. So, based on this feedback, I guess that something like the
attached would be what we are looking for. I have maximized the
amount of code removed with the removal of -z/-Z, but I won't fight
hard if the consensus is to keep them, either. We could also keep
-z/--gzip, and stick -Z to the new --compression-level with
--compress removed.
I mean, I really don't understand the benefit of removing -z and -Z.
-z can remain a synonym for --client-compress=gzip and -Z for
--client-compress=gzip --compression-level=$N and nobody will be
harmed. Taking them out reduces backward compatibility for no gain
that I can see.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Jan 15, 2022 at 10:15 AM Magnus Hagander <magnus@hagander.net> wrote:
It never makes sense to compress *both* in server and client, right?
Yeah.
One argument in that case for using --compress would be that we could
have that one take options like --compress=gzip (use gzip in the
client) and --compress=server-lz4 (use lz4 on the server), and
automatically make it impossible to do both. And maybe also accept
--compress=client-gzip (which would be the same as just specifying
gzip).That would be an argument for actually keeping --compress and not
using --client-compress, because obviously it would be silly to have
--client-compress=server-lz4...
I still like distinguishing it using the option name, but differently:
--compress=METHOD and --server-compress=METHOD. But this is also a
reasonable proposal.
And yes, I agree that considering both server and client compression
even if we don't have server compression makes sense, since we don't
want to change things around again when we get it.
Especially not because I'm pretty close to having a committable patch
and intend to try to get this into v15. See the refactoring
basebackup.c thread.
We could perhaps also consider accepting --compress=gzip:7
(<method>:<level>) as a way to specify the level, for both client and
server side.
That's not crazy either. Initially I was thinking --compression=gzip7
but then it turns out lz4 is one of the methods we want to use, and
lz47 would be, err, slightly unclear. lz4:7 is better, for sure.
I think having --client-compress and --server-compress separately but
having --compression-level *not* being separate would be confusing and
I *think* that's what the current patch proposes?
Depends on what you mean by "separate". There's no proposal to have
--client-compression-level and also --server-compression-level.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Jan 17, 2022 at 3:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Jan 15, 2022 at 10:15 AM Magnus Hagander <magnus@hagander.net> wrote:
It never makes sense to compress *both* in server and client, right?
Yeah.
One argument in that case for using --compress would be that we could
have that one take options like --compress=gzip (use gzip in the
client) and --compress=server-lz4 (use lz4 on the server), and
automatically make it impossible to do both. And maybe also accept
--compress=client-gzip (which would be the same as just specifying
gzip).That would be an argument for actually keeping --compress and not
using --client-compress, because obviously it would be silly to have
--client-compress=server-lz4...I still like distinguishing it using the option name, but differently:
--compress=METHOD and --server-compress=METHOD. But this is also a
reasonable proposal.And yes, I agree that considering both server and client compression
even if we don't have server compression makes sense, since we don't
want to change things around again when we get it.Especially not because I'm pretty close to having a committable patch
and intend to try to get this into v15. See the refactoring
basebackup.c thread.We could perhaps also consider accepting --compress=gzip:7
(<method>:<level>) as a way to specify the level, for both client and
server side.That's not crazy either. Initially I was thinking --compression=gzip7
but then it turns out lz4 is one of the methods we want to use, and
lz47 would be, err, slightly unclear. lz4:7 is better, for sure.I think having --client-compress and --server-compress separately but
having --compression-level *not* being separate would be confusing and
I *think* that's what th e current patch proposes?Depends on what you mean by "separate". There's no proposal to have
--client-compression-level and also --server-compression-level.
I mean that I think it would be confusing to have
--client-compression=x, --server-compression=y, and
compression-level=z as the options. Why, in that scenario, does the
"compression" part get two parameters, but the "compression level"
part get one. In that case, there should either be --compression=x and
--compression-level=z (which is what I'd suggest, per above), or there
should be --client-compression, --server-compression,
--client-compression-level and --server-compression-level, for it to
be consistent. But having one of them be split in two parameters and
the other one not, is what I'd consider confusing.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Mon, Jan 17, 2022 at 4:56 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jan 15, 2022 at 04:15:26PM +0100, Magnus Hagander wrote:
I think having --client-compress and --server-compress separately but
having --compression-level *not* being separate would be confusing and
I *think* that's what the current patch proposes?Yep, your understanding is right. The last version of the patch
posted does exactly that.
Ok. Then that is exactly what I think is confusing, and thus object to :)
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Mon, Jan 17, 2022 at 9:27 AM Magnus Hagander <magnus@hagander.net> wrote:
I mean that I think it would be confusing to have
--client-compression=x, --server-compression=y, and
compression-level=z as the options. Why, in that scenario, does the
"compression" part get two parameters, but the "compression level"
part get one. In that case, there should either be --compression=x and
--compression-level=z (which is what I'd suggest, per above), or there
should be --client-compression, --server-compression,
--client-compression-level and --server-compression-level, for it to
be consistent. But having one of them be split in two parameters and
the other one not, is what I'd consider confusing.
I don't find that confusing, but confusion is a pretty subjective
experience so that doesn't really prove anything. Of the two
alternatives that you propose, I prefer --compress=["server-"]METHOD
and --compression-level=NUMBER to having both
--client-compression-level and --server-compression-level. To me,
that's still a bit more surprising than my proposal, because having
the client compress stuff and having the server compress stuff feel
like somewhat different kinds of things ... but it's unsurprising that
I like my own proposal, and what really matters is that we converge
relatively quickly on something we can all live with.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2022-Jan-17, Robert Haas wrote:
Of the two
alternatives that you propose, I prefer --compress=["server-"]METHOD
and --compression-level=NUMBER to having both
--client-compression-level and --server-compression-level. To me,
that's still a bit more surprising than my proposal, because having
the client compress stuff and having the server compress stuff feel
like somewhat different kinds of things ... but it's unsurprising that
I like my own proposal, and what really matters is that we converge
relatively quickly on something we can all live with.
I think having a single option where you specify everything is simpler.
I propose we accept these forms:
--compress=[{server,client}-]method[:level] new in 15
--compress=level (accepted by 14)
-Z level (accepted by 14)
-z (accepted by 14)
This way, compatibility with the existing release is maintained; and we
introduce all the new functionality without cluttering the interface.
So starting from 15, in addition to the already supported forms, users
will be able to do
--compress=server-gzip:8 (completely specified options)
--compress=client-lz4 (client-side lz4 compression, default level)
--compress=zstd (server-side zstd compression)
there's a bit of string parsing required to implement, but that seems
okay to me -- the UI seems clear enough and easily documented.
One missing feature in this spec is the ability to specify compression
to be used with whatever the default method is. I'm not sure we want to
allow for that, but it could be
--compress=client
--compress=server
which uses whatever method is default, with whatever level is default,
at either side.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
On Mon, Jan 17, 2022 at 8:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 17, 2022 at 9:27 AM Magnus Hagander <magnus@hagander.net>
wrote:I mean that I think it would be confusing to have
--client-compression=x, --server-compression=y, and
compression-level=z as the options. Why, in that scenario, does the
"compression" part get two parameters, but the "compression level"
part get one. In that case, there should either be --compression=x and
--compression-level=z (which is what I'd suggest, per above), or there
should be --client-compression, --server-compression,
--client-compression-level and --server-compression-level, for it to
be consistent. But having one of them be split in two parameters and
the other one not, is what I'd consider confusing.I don't find that confusing, but confusion is a pretty subjective
experience so that doesn't really prove anything. Of the two
alternatives that you propose, I prefer --compress=["server-"]METHOD
and --compression-level=NUMBER to having both
--client-compression-level and --server-compression-level. To me,
that's still a bit more surprising than my proposal, because having
the client compress stuff and having the server compress stuff feel
like somewhat different kinds of things ... but it's unsurprising that
I like my own proposal, and what really matters is that we converge
relatively quickly on something we can all live with.
Quick look-over of the email thread:
The bare "--compress" option isn't liked anymore. I would prefer that we
officially deprecate -z, -Z, and --compress but otherwise leave them alone
for backward compatibility.
We do not want to entertain performing both server and client compression.
It thus seems undesirable to have different sets of options for them.
Therefore:
--compression-method={gzip|lz4|...}
--compression-level={string} (which can be any string value, the validation
logic for compression-method will evaluate what is provided and error if it
is not happy, each method would have its own default)
--compression-location={client|server} (Can be added once server
compression is active. I would suggest it would default to server-side
compression - which would be a change in behavior by necessity)
If you really want a concise option here I say we make available:
--compression={method}[;string][;{client|server}]
The two trailing optional (with default) sub-arguments are unambiguous as
to which one is present if only two sub-arguments are provided.
David J.
On Mon, Jan 17, 2022 at 8:41 AM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:
On 2022-Jan-17, Robert Haas wrote:
Of the two
alternatives that you propose, I prefer --compress=["server-"]METHOD
and --compression-level=NUMBER to having both
--client-compression-level and --server-compression-level. To me,
that's still a bit more surprising than my proposal, because having
the client compress stuff and having the server compress stuff feel
like somewhat different kinds of things ... but it's unsurprising that
I like my own proposal, and what really matters is that we converge
relatively quickly on something we can all live with.I think having a single option where you specify everything is simpler.
I propose we accept these forms:--compress=[{server,client}-]method[:level] new in 15
--compress=level (accepted by 14)
-Z level (accepted by 14)
-z (accepted by 14)
I am also in favor of this option. Whether this is better than deprecating
--compress and introducing --compression I am having trouble deciding. My
personal preference is to add --compression and leave --compress alone and
deprecated; but we don't usually do anything with deprecations and having
users seeing both --compress and --compression out in the wild, even if
never at the same time, is bound to elicit questions (though so is seeing
--compress with "number only" rules and "composite value" rules...)
This way, compatibility with the existing release is maintained; and we
introduce all the new functionality without cluttering the interface.
I would still "clutter" the interface with:
--compress-method
--compress-options (extending from my prior post, I would make this more
generic - i.e., not named "level" - and deal with valid values, meaning,
and format, in a per-method description in the documentation)
--compress-location
Users have different preferences for what they want to use, and it provides
a level of self-documentation for the composite specification and a degree
of explicitness for the actual documentation of the methods.
One missing feature in this spec is the ability to specify compression
to be used with whatever the default method is. I'm not sure we want to
allow for that
I'm not too keen on making a default method in code. Saying "if in doubt
gzip is a widely used compression method." in the documentation seems
sufficient.
David J.
On Mon, Jan 17, 2022 at 11:50 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
I think having a single option where you specify everything is simpler.
I propose we accept these forms:--compress=[{server,client}-]method[:level] new in 15
--compress=level (accepted by 14)
-Z level (accepted by 14)
-z (accepted by 14)I am also in favor of this option. Whether this is better than deprecating --compress and introducing --compression I am having trouble deciding. My personal preference is to add --compression and leave --compress alone and deprecated; but we don't usually do anything with deprecations and having users seeing both --compress and --compression out in the wild, even if never at the same time, is bound to elicit questions (though so is seeing --compress with "number only" rules and "composite value" rules...)
Alvaro's proposal is fine with me. I don't see any value in replacing
--compress with --compression. It's longer but not superior in any way
that I can see. Having both seems worst of all -- that's just
confusing.
I'm not too keen on making a default method in code. Saying "if in doubt gzip is a widely used compression method." in the documentation seems sufficient.
Yeah, I agree that a default method doesn't seem necessary. People who
want to compress without thinking hard can use -z; others can say what
they want.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Jan 17, 2022 at 12:48:12PM -0500, Robert Haas wrote:
Alvaro's proposal is fine with me. I don't see any value in replacing
--compress with --compression. It's longer but not superior in any way
that I can see. Having both seems worst of all -- that's just
confusing.
Okay, that looks like a consensus, then. Robert, would it be better
to gather all that on the thread that deals with the server-side
compression? Doing that here would be fine by me, with the option to
only specify the client. Now it would be a bit weird to do things
with only the client part and not the server part :)
--
Michael
On Mon, Jan 17, 2022 at 8:36 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jan 17, 2022 at 12:48:12PM -0500, Robert Haas wrote:
Alvaro's proposal is fine with me. I don't see any value in replacing
--compress with --compression. It's longer but not superior in any way
that I can see. Having both seems worst of all -- that's just
confusing.Okay, that looks like a consensus, then. Robert, would it be better
to gather all that on the thread that deals with the server-side
compression? Doing that here would be fine by me, with the option to
only specify the client. Now it would be a bit weird to do things
with only the client part and not the server part :)
I think it could make sense for you implement
--compress=METHOD[:LEVEL], keeping -z and -Z N as synonyms for
--compress=gzip and --compress=gzip:N, and with --compress=N being
interpreted as --compress=gzip:N. Then I'll generalize that to
--compress=[{client|server}-]METHOD[:LEVEL] on the other thread. I
think we should leave it where, if you don't say either client or
server, you get client, because that's the historical behavior.
If that doesn't work for you, please let me know what you would prefer.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 18, 2022 at 10:04:56AM -0500, Robert Haas wrote:
I think it could make sense for you implement
--compress=METHOD[:LEVEL], keeping -z and -Z N as synonyms for
--compress=gzip and --compress=gzip:N, and with --compress=N being
interpreted as --compress=gzip:N. Then I'll generalize that to
--compress=[{client|server}-]METHOD[:LEVEL] on the other thread. I
think we should leave it where, if you don't say either client or
server, you get client, because that's the historical behavior.If that doesn't work for you, please let me know what you would prefer.
WFM. Attached is a patch that extends --compress to handle a method
with an optional compression level. Some extra tests are added to
cover all that.
Thoughts?
--
Michael
Attachments:
v5-0001-Rework-the-option-set-of-pg_basebackup.patchtext/x-diff; charset=us-asciiDownload
From abf0848b0975f5a33ecaee725ab616b92b914820 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 19 Jan 2022 13:27:00 +0900
Subject: [PATCH v5] Rework the option set of pg_basebackup
---
src/bin/pg_basebackup/pg_basebackup.c | 138 ++++++++++++++++---
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 46 ++++++-
doc/src/sgml/ref/pg_basebackup.sgml | 21 ++-
3 files changed, 182 insertions(+), 23 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2a58be638a..734fab4e43 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -123,6 +123,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static WalCompressionMethod compressmethod = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -376,7 +377,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -Z, --compress=[{gzip,none}[:LEVEL] or [LEVEL]\n"
+ " compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -541,8 +543,7 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- (compresslevel != 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE,
+ compressmethod,
compresslevel,
stream.do_sync);
@@ -933,6 +934,78 @@ parse_max_rate(char *src)
return (int32) result;
}
+/*
+ * Utility wrapper to parse the values specified for -Z/--compress.
+ * *methodres and *levelres will be optionally filled with values coming
+ * from the parsed results.
+ */
+static void
+parse_compress_options(char *src, WalCompressionMethod *methodres,
+ int *levelres)
+{
+ /* First check after the compression method */
+ if (strncmp(src, "gzip", 4) == 0 ||
+ strncmp(src, "none", 4) == 0)
+ {
+ char *p = src;
+
+ if (strncmp(src, "gzip", 4) == 0)
+ {
+ *methodres = COMPRESSION_GZIP;
+ p += 4;
+ }
+ else if (strncmp(src, "none", 4) == 0)
+ {
+ *methodres = COMPRESSION_NONE;
+ p += 4;
+ }
+
+ /* If the option has no more data, we are done */
+ if (*p == '\0')
+ return;
+
+ /*
+ * Specifying the level is optional, and it should be prefixed
+ * by a colon.
+ */
+ if (*p != ':')
+ {
+ pg_log_error("invalid separator found to specify compression level: expected ':' but found '%c'",
+ *p);
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ p++;
+
+ if (*p == '\0')
+ {
+ pg_log_error("empty value for option %s", "-Z/--compress");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ if (!option_parse_int(p, "-Z/--compress", 0, 9,
+ levelres))
+ exit(1);
+ return;
+ }
+
+ /*
+ * If the beginning of the option value does not match any of the methods
+ * supported, check for an integer. This is backward-compatible with the
+ * pre-14 behavior where a caller could define a compression level of
+ * 0 for no compression, and of [1,9] for gzip compression.
+ */
+ if (!option_parse_int(src, "-Z/--compress", 0, 9,
+ levelres))
+ exit(1);
+
+ *methodres = (*levelres > 0) ?
+ COMPRESSION_GZIP : COMPRESSION_NONE;
+}
+
/*
* Read a stream of COPY data and invoke the provided callback for each
* chunk.
@@ -993,7 +1066,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile)
{
- bbstreamer *streamer;
+ bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool must_parse_archive;
@@ -1052,19 +1125,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
+ if (compressmethod == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compressmethod == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/*
* If we need to parse the archive for whatever reason, then we'll
@@ -2220,11 +2296,11 @@ main(int argc, char **argv)
#else
compresslevel = 1; /* will be rejected below */
#endif
+ compressmethod = COMPRESSION_GZIP;
break;
case 'Z':
- if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
- &compresslevel))
- exit(1);
+ parse_compress_options(optarg, &compressmethod,
+ &compresslevel);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2321,7 +2397,7 @@ main(int argc, char **argv)
/*
* Mutually exclusive arguments
*/
- if (format == 'p' && compresslevel != 0)
+ if (format == 'p' && compressmethod != COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2399,13 +2475,39 @@ main(int argc, char **argv)
}
}
-#ifndef HAVE_LIBZ
- if (compresslevel != 0)
+ /*
+ * Compression-related checks.
+ */
+ switch (compressmethod)
{
- pg_log_error("this build does not support compression");
- exit(1);
- }
+ case COMPRESSION_NONE:
+ if (compresslevel != 0)
+ {
+ pg_log_error("cannot use compression level with method %s",
+ "none");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ break;
+ case COMPRESSION_GZIP:
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ {
+ pg_log_info("no value specified for compression level, switching to default");
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ }
+#else
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ break;
+ case COMPRESSION_LZ4:
+ /* option not supported */
+ Assert(false);
+ break;
+ }
if (showprogress && !estimatesize)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index f0243f28d4..0ddbbee818 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 115;
+use Test::More tests => 123;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -38,6 +38,32 @@ my $pgdata = $node->data_dir;
$node->command_fails(['pg_basebackup'],
'pg_basebackup needs target directory specified');
+# Sanity checks for options
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compress',
+ 'none:1'
+ ],
+ qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ 'failure if method "none" specified with compression level');
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compress',
+ 'none+'
+ ],
+ qr/\Qpg_basebackup: error: invalid separator found to specify compression level/,
+ 'failure on incorrect separator to define compression level');
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compress',
+ 'none:'
+ ],
+ qr/\Qpg_basebackup: error: empty value for option/,
+ 'failure on missing compression level value');
+
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -637,7 +663,7 @@ note "Testing pg_basebackup with compression methods";
# Check ZLIB compression if available.
SKIP:
{
- skip "postgres was not built with ZLIB support", 5
+ skip "postgres was not built with ZLIB support", 7
if (!check_pg_config("#define HAVE_LIBZ 1"));
$node->command_ok(
@@ -655,15 +681,26 @@ SKIP:
'--format', 't'
],
'pg_basebackup with --gzip');
+ $node->command_ok(
+ [
+ @pg_basebackup_defs, '-D',
+ "$tempdir/backup_gzip3", '--compress',
+ 'gzip:1',
+ '--format', 't'
+ ],
+ 'pg_basebackup with --compress=gzip:1');
# Verify that the stored files are generated with their expected
# names.
my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
is(scalar(@zlib_files), 2,
- "two files created with --compress (base.tar.gz and pg_wal.tar.gz)");
+ "two files created with --compress=NUM (base.tar.gz and pg_wal.tar.gz)");
my @zlib_files2 = glob "$tempdir/backup_gzip2/*.tar.gz";
is(scalar(@zlib_files2), 2,
"two files created with --gzip (base.tar.gz and pg_wal.tar.gz)");
+ my @zlib_files3 = glob "$tempdir/backup_gzip3/*.tar.gz";
+ is(scalar(@zlib_files3), 2,
+ "two files created with --compress=gzip:NUM (base.tar.gz and pg_wal.tar.gz)");
# Check the integrity of the files generated.
my $gzip = $ENV{GZIP_PROGRAM};
@@ -673,8 +710,9 @@ SKIP:
|| system_log($gzip, '--version') != 0);
my $gzip_is_valid =
- system_log($gzip, '--test', @zlib_files, @zlib_files2);
+ system_log($gzip, '--test', @zlib_files, @zlib_files2, @zlib_files3);
is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
rmtree("$tempdir/backup_gzip");
rmtree("$tempdir/backup_gzip2");
+ rmtree("$tempdir/backup_gzip3");
}
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..1944647707 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -368,15 +368,24 @@ PostgreSQL documentation
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
+ <term><option>-Z [<replaceable class="parameter">method</replaceable></option>:[=<replaceable>level</replaceable>]]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
+ <term><option>--compress=[<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]]</term>
<listitem>
<para>
- Enables gzip compression of tar file output, and specifies the
+ Enables compression of tar file output, and specifies the
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
automatically be added to all tar filenames.
</para>
+ <para>
+ The compression method can be set to either <literal>gzip</literal>
+ for compression with <application>gzip</application>, or
+ <literal>none</literal> for no compression. A compression level
+ can be optionally specified, by appending the level number after a
+ colon (<literal>:</literal>).
+ </para>
</listitem>
</varlistentry>
</variablelist>
@@ -912,6 +921,16 @@ PostgreSQL documentation
<screen>
<prompt>$</prompt> <userinput>pg_basebackup -D backup/data -T /opt/ts=$(pwd)/backup/ts</userinput>
</screen></para>
+
+ <para>
+ To create a backup of a local server with one tar file for each tablespace
+ compressed with <application>gzip</application> at level 9, stored in the
+ directory <filename>backup</filename>:
+<screen>
+<prompt>$</prompt> <userinput>pg_basebackup -D backup -Ft --compress=gzip:9</userinput>
+</screen>
+ </para>
+
</refsect1>
<refsect1>
--
2.34.1
On Tue, Jan 18, 2022 at 11:27 PM Michael Paquier <michael@paquier.xyz> wrote:
WFM. Attached is a patch that extends --compress to handle a method
with an optional compression level. Some extra tests are added to
cover all that.
I think that this will reject something like --compress=nonetheless by
telling you that 't' is not a valid separator. I think it would be
better to code this so that we first identify the portion preceding
the first colon, or the whole string if there is no colon. Then we
check whether that part is a compression method that we recognize. If
not, we complain. If so, we then check whatever is after the separator
for validity - and this might differ by type. For example, we could
then immediately reject none:4, and if in the future we want to allow
lz4:fast3, we could.
I think the code that handles the bare integer case should be at the
top of the function and should return, because that code is short.
Then the rest of the function doesn't need to be indented as deeply.
"First check after the compression method" seems like it would be
better written "First check for the compression method" or "First
check the compression method".
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2022-Jan-19, Michael Paquier wrote:
+ printf(_(" -Z, --compress=[{gzip,none}[:LEVEL] or [LEVEL]\n" + " compress tar output with given compression method or level\n"));
Note there is an extra [ before the {gzip bit.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)
On Wed, Jan 19, 2022 at 12:50:44PM -0300, Alvaro Herrera wrote:
Note there is an extra [ before the {gzip bit.
Thanks!
--
Michael
On Wed, Jan 19, 2022 at 08:35:23AM -0500, Robert Haas wrote:
I think that this will reject something like --compress=nonetheless by
telling you that 't' is not a valid separator. I think it would be
better to code this so that we first identify the portion preceding
the first colon, or the whole string if there is no colon. Then we
check whether that part is a compression method that we recognize. If
not, we complain.
Well, if no colon is specified, we still need to check if optarg
is a pure integer if it does not match any of the supported methods,
as --compress=0 should be backward compatible with no compression and
--compress=1~9 should imply gzip, no?
If so, we then check whatever is after the separator
for validity - and this might differ by type. For example, we could
then immediately reject none:4, and if in the future we want to allow
lz4:fast3, we could.
Okay.
I think the code that handles the bare integer case should be at the
top of the function and should return, because that code is short.
Then the rest of the function doesn't need to be indented as deeply.
Done this way, I hope.
--
Michael
Attachments:
v6-0001-Rework-the-option-set-of-pg_basebackup.patchtext/x-diff; charset=us-asciiDownload
From b5d84d8daac4c537657bfeef2ed0b5550106ffc2 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 20 Jan 2022 15:59:54 +0900
Subject: [PATCH v6] Rework the option set of pg_basebackup
---
src/bin/pg_basebackup/pg_basebackup.c | 150 ++++++++++++++++---
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 46 +++++-
doc/src/sgml/ref/pg_basebackup.sgml | 21 ++-
3 files changed, 194 insertions(+), 23 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2a58be638a..ae003766cc 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -123,6 +123,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static WalCompressionMethod compressmethod = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -376,7 +377,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+ " compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -541,8 +543,7 @@ LogStreamerMain(logstreamer_param *param)
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- (compresslevel != 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE,
+ compressmethod,
compresslevel,
stream.do_sync);
@@ -933,6 +934,84 @@ parse_max_rate(char *src)
return (int32) result;
}
+/*
+ * Utility wrapper to parse the values specified for -Z/--compress.
+ * *methodres and *levelres will be optionally filled with values coming
+ * from the parsed results.
+ */
+static void
+parse_compress_options(char *src, WalCompressionMethod *methodres,
+ int *levelres)
+{
+ char *sep;
+ int firstlen;
+ char *firstpart = NULL;
+
+ /* check if the option is split in two */
+ sep = strchr(src, ':');
+
+ /*
+ * The first part of the option value could be a method name, or just
+ * a level value.
+ */
+ firstlen = (sep != NULL) ? (sep - src) : strlen(src);
+ firstpart = pg_malloc(firstlen + 1);
+ strncpy(firstpart, src, firstlen);
+ firstpart[firstlen] = '\0';
+
+ /*
+ * Check if the first part of the string matches with a supported
+ * compression method.
+ */
+ if (pg_strcasecmp(firstpart, "gzip") != 0 &&
+ pg_strcasecmp(firstpart, "none") != 0)
+ {
+ /*
+ * It does not match anything known, so check for the
+ * backward-compatible case of only an integer, where the implied
+ * compression method changes depending on the level value.
+ */
+ if (!option_parse_int(firstpart, "-Z/--compress", 0,
+ INT_MAX, levelres))
+ exit(1);
+
+ *methodres = (*levelres > 0) ?
+ COMPRESSION_GZIP : COMPRESSION_NONE;
+ return;
+ }
+
+ /* Supported method found. */
+ if (pg_strcasecmp(firstpart, "gzip") == 0)
+ *methodres = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(firstpart, "none") == 0)
+ *methodres = COMPRESSION_NONE;
+
+ if (sep == NULL)
+ {
+ /*
+ * The caller specified a method without a colon separator, so let
+ * any subsequent checks assign a default.
+ */
+ return;
+ }
+
+ /* Check the contents after the colon separator. */
+ sep++;
+ if (*sep == '\0')
+ {
+ pg_log_error("no compression level defined for method %s", firstpart);
+ exit(1);
+ }
+
+ /*
+ * For any of the methods currently supported, the data after the
+ * separator can just be an integer.
+ */
+ if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
+ levelres))
+ exit(1);
+}
+
/*
* Read a stream of COPY data and invoke the provided callback for each
* chunk.
@@ -993,7 +1072,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile)
{
- bbstreamer *streamer;
+ bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool must_parse_archive;
@@ -1052,19 +1131,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
+ if (compressmethod == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compressmethod == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }
/*
* If we need to parse the archive for whatever reason, then we'll
@@ -2220,11 +2302,11 @@ main(int argc, char **argv)
#else
compresslevel = 1; /* will be rejected below */
#endif
+ compressmethod = COMPRESSION_GZIP;
break;
case 'Z':
- if (!option_parse_int(optarg, "-Z/--compress", 0, 9,
- &compresslevel))
- exit(1);
+ parse_compress_options(optarg, &compressmethod,
+ &compresslevel);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2321,7 +2403,7 @@ main(int argc, char **argv)
/*
* Mutually exclusive arguments
*/
- if (format == 'p' && compresslevel != 0)
+ if (format == 'p' && compressmethod != COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2399,13 +2481,45 @@ main(int argc, char **argv)
}
}
-#ifndef HAVE_LIBZ
- if (compresslevel != 0)
+ /*
+ * Compression-related checks.
+ */
+ switch (compressmethod)
{
- pg_log_error("this build does not support compression");
- exit(1);
- }
+ case COMPRESSION_NONE:
+ if (compresslevel != 0)
+ {
+ pg_log_error("cannot use compression level with method %s",
+ "none");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ break;
+ case COMPRESSION_GZIP:
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ {
+ pg_log_info("no value specified for compression level, switching to default");
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ }
+ if (compresslevel > 9)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 9",
+ compresslevel, "gzip");
+ exit(1);
+ }
+#else
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ break;
+ case COMPRESSION_LZ4:
+ /* option not supported */
+ Assert(false);
+ break;
+ }
if (showprogress && !estimatesize)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index f0243f28d4..2621296a49 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 115;
+use Test::More tests => 123;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -38,6 +38,32 @@ my $pgdata = $node->data_dir;
$node->command_fails(['pg_basebackup'],
'pg_basebackup needs target directory specified');
+# Sanity checks for options
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compress',
+ 'none:1'
+ ],
+ qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ 'failure if method "none" specified with compression level');
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compress',
+ 'none+'
+ ],
+ qr/\Qpg_basebackup: error: invalid value "none+" for option/,
+ 'failure on incorrect separator to define compression level');
+$node->command_fails_like(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup", '--compress',
+ 'none:'
+ ],
+ qr/\Qpg_basebackup: error: no compression level defined for method none/,
+ 'failure on missing compression level value');
+
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
if (open my $badchars, '>>', "$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
@@ -637,7 +663,7 @@ note "Testing pg_basebackup with compression methods";
# Check ZLIB compression if available.
SKIP:
{
- skip "postgres was not built with ZLIB support", 5
+ skip "postgres was not built with ZLIB support", 7
if (!check_pg_config("#define HAVE_LIBZ 1"));
$node->command_ok(
@@ -655,15 +681,26 @@ SKIP:
'--format', 't'
],
'pg_basebackup with --gzip');
+ $node->command_ok(
+ [
+ @pg_basebackup_defs, '-D',
+ "$tempdir/backup_gzip3", '--compress',
+ 'gzip:1',
+ '--format', 't'
+ ],
+ 'pg_basebackup with --compress=gzip:1');
# Verify that the stored files are generated with their expected
# names.
my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
is(scalar(@zlib_files), 2,
- "two files created with --compress (base.tar.gz and pg_wal.tar.gz)");
+ "two files created with --compress=NUM (base.tar.gz and pg_wal.tar.gz)");
my @zlib_files2 = glob "$tempdir/backup_gzip2/*.tar.gz";
is(scalar(@zlib_files2), 2,
"two files created with --gzip (base.tar.gz and pg_wal.tar.gz)");
+ my @zlib_files3 = glob "$tempdir/backup_gzip3/*.tar.gz";
+ is(scalar(@zlib_files3), 2,
+ "two files created with --compress=gzip:NUM (base.tar.gz and pg_wal.tar.gz)");
# Check the integrity of the files generated.
my $gzip = $ENV{GZIP_PROGRAM};
@@ -673,8 +710,9 @@ SKIP:
|| system_log($gzip, '--version') != 0);
my $gzip_is_valid =
- system_log($gzip, '--test', @zlib_files, @zlib_files2);
+ system_log($gzip, '--test', @zlib_files, @zlib_files2, @zlib_files3);
is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
rmtree("$tempdir/backup_gzip");
rmtree("$tempdir/backup_gzip2");
+ rmtree("$tempdir/backup_gzip3");
}
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..feabd637eb 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -368,15 +368,24 @@ PostgreSQL documentation
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
+ <term><option>-Z <replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
+ <term><option>--compress=<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<listitem>
<para>
- Enables gzip compression of tar file output, and specifies the
+ Enables compression of tar file output, and specifies the
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
automatically be added to all tar filenames.
</para>
+ <para>
+ The compression method can be set to either <literal>gzip</literal>
+ for compression with <application>gzip</application>, or
+ <literal>none</literal> for no compression. A compression level
+ can be optionally specified, by appending the level number after a
+ colon (<literal>:</literal>).
+ </para>
</listitem>
</varlistentry>
</variablelist>
@@ -912,6 +921,16 @@ PostgreSQL documentation
<screen>
<prompt>$</prompt> <userinput>pg_basebackup -D backup/data -T /opt/ts=$(pwd)/backup/ts</userinput>
</screen></para>
+
+ <para>
+ To create a backup of a local server with one tar file for each tablespace
+ compressed with <application>gzip</application> at level 9, stored in the
+ directory <filename>backup</filename>:
+<screen>
+<prompt>$</prompt> <userinput>pg_basebackup -D backup -Ft --compress=gzip:9</userinput>
+</screen>
+ </para>
+
</refsect1>
<refsect1>
--
2.34.1
On Thu, Jan 20, 2022 at 2:03 AM Michael Paquier <michael@paquier.xyz> wrote:
Well, if no colon is specified, we still need to check if optarg
is a pure integer if it does not match any of the supported methods,
as --compress=0 should be backward compatible with no compression and
--compress=1~9 should imply gzip, no?
Yes.
Done this way, I hope.
This looks better, but this part could be switched around:
+ /*
+ * Check if the first part of the string matches with a supported
+ * compression method.
+ */
+ if (pg_strcasecmp(firstpart, "gzip") != 0 &&
+ pg_strcasecmp(firstpart, "none") != 0)
+ {
+ /*
+ * It does not match anything known, so check for the
+ * backward-compatible case of only an integer, where the implied
+ * compression method changes depending on the level value.
+ */
+ if (!option_parse_int(firstpart, "-Z/--compress", 0,
+ INT_MAX, levelres))
+ exit(1);
+
+ *methodres = (*levelres > 0) ?
+ COMPRESSION_GZIP : COMPRESSION_NONE;
+ return;
+ }
+
+ /* Supported method found. */
+ if (pg_strcasecmp(firstpart, "gzip") == 0)
+ *methodres = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(firstpart, "none") == 0)
+ *methodres = COMPRESSION_NONE;
You don't need to test for gzip and none in two places each. Just make
the block with the "It does not match ..." comment the "else" clause
for this last part.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Jan 20, 2022 at 10:25:43AM -0500, Robert Haas wrote:
You don't need to test for gzip and none in two places each. Just make
the block with the "It does not match ..." comment the "else" clause
for this last part.
Indeed, that looks better. I have done an extra pass on this stuff
this morning, and applied it, so we should be done here.
--
Michael
On Thu, Jan 20, 2022 at 9:18 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jan 20, 2022 at 10:25:43AM -0500, Robert Haas wrote:
You don't need to test for gzip and none in two places each. Just make
the block with the "It does not match ..." comment the "else" clause
for this last part.Indeed, that looks better. I have done an extra pass on this stuff
this morning, and applied it, so we should be done here.
Thanks. One thing I just noticed is that the enum we're using here is
called WalCompressionMethod. But we're not compressing WAL. We're
compressing tarfiles of the data directory.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Jan 21, 2022 at 09:57:41AM -0500, Robert Haas wrote:
Thanks. One thing I just noticed is that the enum we're using here is
called WalCompressionMethod. But we're not compressing WAL. We're
compressing tarfiles of the data directory.
Also, having this enum in walmethods.h is perhaps not the best place
either, even more if you plan to use that in pg_basebackup for the
server-side compression. One idea is to rename this enum to
DataCompressionMethod, moving it into a new header, like common.h as
of the attached.
--
Michael
Attachments:
0001-Move-declaration-in-new-header-for-pg_basebackup.patchtext/x-diff; charset=us-asciiDownload
From 3d2aec3abdfa8d2c6faf639c1cbaaf73d9064f11 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Sat, 22 Jan 2022 14:43:19 +0900
Subject: [PATCH] Move declaration in new header for pg_basebackup
---
src/bin/pg_basebackup/common.h | 26 ++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 5 +++--
src/bin/pg_basebackup/pg_receivewal.c | 6 +++---
src/bin/pg_basebackup/walmethods.c | 12 ++++++------
src/bin/pg_basebackup/walmethods.h | 15 ++++-----------
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 43 insertions(+), 23 deletions(-)
create mode 100644 src/bin/pg_basebackup/common.h
diff --git a/src/bin/pg_basebackup/common.h b/src/bin/pg_basebackup/common.h
new file mode 100644
index 0000000000..abad3b25ca
--- /dev/null
+++ b/src/bin/pg_basebackup/common.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * common.h
+ *
+ * Common declarations shared across all tools of src/bin/pg_basebackup/.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/common.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BASEBACKUP_COMMON_H
+#define BASEBACKUP_COMMON_H
+
+/* Types of compression supported */
+typedef enum
+{
+ COMPRESSION_GZIP,
+ COMPRESSION_LZ4,
+ COMPRESSION_NONE
+} DataCompressionMethod;
+
+#endif /* BASEBACKUP_COMMON_H */
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d5b0ade10d..2c66e4fd87 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
#include "access/xlog_internal.h"
#include "bbstreamer.h"
+#include "common.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -123,7 +124,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
+static DataCompressionMethod compressmethod = COMPRESSION_NONE;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -943,7 +944,7 @@ parse_max_rate(char *src)
* from the parsed results.
*/
static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
+parse_compress_options(char *src, DataCompressionMethod *methodres,
int *levelres)
{
char *sep;
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ccb215c398..9f2e633c80 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -52,7 +52,7 @@ static bool do_drop_slot = false;
static bool do_sync = true;
static bool synchronous = false;
static char *replication_slot = NULL;
-static WalCompressionMethod compression_method = COMPRESSION_NONE;
+static DataCompressionMethod compression_method = COMPRESSION_NONE;
static XLogRecPtr endpos = InvalidXLogRecPtr;
@@ -114,7 +114,7 @@ usage(void)
*/
static bool
is_xlogfilename(const char *filename, bool *ispartial,
- WalCompressionMethod *wal_compression_method)
+ DataCompressionMethod *wal_compression_method)
{
size_t fname_len = strlen(filename);
size_t xlog_pattern_len = strspn(filename, "0123456789ABCDEF");
@@ -285,7 +285,7 @@ FindStreamingStart(uint32 *tli)
{
uint32 tli;
XLogSegNo segno;
- WalCompressionMethod wal_compression_method;
+ DataCompressionMethod wal_compression_method;
bool ispartial;
if (!is_xlogfilename(dirent->d_name,
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index f74bd13315..b87afec5a6 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -49,7 +49,7 @@
typedef struct DirectoryMethodData
{
char *basedir;
- WalCompressionMethod compression_method;
+ DataCompressionMethod compression_method;
int compression_level;
bool sync;
const char *lasterrstring; /* if set, takes precedence over lasterrno */
@@ -561,7 +561,7 @@ dir_get_file_size(const char *pathname)
return statbuf.st_size;
}
-static WalCompressionMethod
+static DataCompressionMethod
dir_compression_method(void)
{
return dir_data->compression_method;
@@ -608,7 +608,7 @@ dir_finish(void)
WalWriteMethod *
CreateWalDirectoryMethod(const char *basedir,
- WalCompressionMethod compression_method,
+ DataCompressionMethod compression_method,
int compression_level, bool sync)
{
WalWriteMethod *method;
@@ -662,7 +662,7 @@ typedef struct TarMethodData
{
char *tarfilename;
int fd;
- WalCompressionMethod compression_method;
+ DataCompressionMethod compression_method;
int compression_level;
bool sync;
TarMethodFile *currentfile;
@@ -983,7 +983,7 @@ tar_get_file_size(const char *pathname)
return -1;
}
-static WalCompressionMethod
+static DataCompressionMethod
tar_compression_method(void)
{
return tar_data->compression_method;
@@ -1322,7 +1322,7 @@ tar_finish(void)
*/
WalWriteMethod *
CreateWalTarMethod(const char *tarbase,
- WalCompressionMethod compression_method,
+ DataCompressionMethod compression_method,
int compression_level, bool sync)
{
WalWriteMethod *method;
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 2dfb353baa..2dc2e6c913 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -9,6 +9,7 @@
*-------------------------------------------------------------------------
*/
+#include "common.h"
typedef void *Walfile;
@@ -19,14 +20,6 @@ typedef enum
CLOSE_NO_RENAME
} WalCloseMethod;
-/* Types of compression supported */
-typedef enum
-{
- COMPRESSION_GZIP,
- COMPRESSION_LZ4,
- COMPRESSION_NONE
-} WalCompressionMethod;
-
/*
* A WalWriteMethod structure represents the different methods used
* to write the streaming WAL as it's received.
@@ -67,7 +60,7 @@ struct WalWriteMethod
char *(*get_file_name) (const char *pathname, const char *temp_suffix);
/* Returns the compression method */
- WalCompressionMethod (*compression_method) (void);
+ DataCompressionMethod (*compression_method) (void);
/*
* Write count number of bytes to the file, and return the number of bytes
@@ -103,10 +96,10 @@ struct WalWriteMethod
* not all those required for pg_receivewal)
*/
WalWriteMethod *CreateWalDirectoryMethod(const char *basedir,
- WalCompressionMethod compression_method,
+ DataCompressionMethod compression_method,
int compression, bool sync);
WalWriteMethod *CreateWalTarMethod(const char *tarbase,
- WalCompressionMethod compression_method,
+ DataCompressionMethod compression_method,
int compression, bool sync);
/* Cleanup routines for previously-created methods */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 89249ecc97..e31c3ea641 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -520,6 +520,7 @@ DR_sqlfunction
DR_transientrel
DSA
DWORD
+DataCompressionMethod
DataDumperPtr
DataPageDeleteStack
DatabaseInfo
@@ -2863,7 +2864,6 @@ WaitEventTimeout
WaitPMResult
WalCloseMethod
WalCompression
-WalCompressionMethod
WalLevel
WalRcvData
WalRcvExecResult
--
2.34.1
On Sat, Jan 22, 2022 at 12:47 AM Michael Paquier <michael@paquier.xyz> wrote:
Also, having this enum in walmethods.h is perhaps not the best place
either, even more if you plan to use that in pg_basebackup for the
server-side compression. One idea is to rename this enum to
DataCompressionMethod, moving it into a new header, like common.h as
of the attached.
Well, we also have CompressionAlgorithm competing for the same job.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 25, 2022 at 03:14:13PM -0500, Robert Haas wrote:
On Sat, Jan 22, 2022 at 12:47 AM Michael Paquier <michael@paquier.xyz> wrote:
Also, having this enum in walmethods.h is perhaps not the best place
either, even more if you plan to use that in pg_basebackup for the
server-side compression. One idea is to rename this enum to
DataCompressionMethod, moving it into a new header, like common.h as
of the attached.Well, we also have CompressionAlgorithm competing for the same job.
Sure, but I don't think that it is a good idea to unify that yet, at
least not until pg_dump is able to handle LZ4 as an option, as the
main benefit that we'd gain here is to be able to change the code to a
switch/case without defaults where we would detect code paths that
require a refresh once adding support for a new option.
--
Michael
On Tue, Jan 25, 2022 at 8:15 PM Michael Paquier <michael@paquier.xyz> wrote:
Sure, but I don't think that it is a good idea to unify that yet, at
least not until pg_dump is able to handle LZ4 as an option, as the
main benefit that we'd gain here is to be able to change the code to a
switch/case without defaults where we would detect code paths that
require a refresh once adding support for a new option.
I think those places could just throw a "lz4 compression is not
supported" elog() and then you could just grep for everyplace where
that string appears. But I am not of a mind to fight about it. I was
just pointing out the duplication.
--
Robert Haas
EDB: http://www.enterprisedb.com