refactoring basebackup.c
Hi,
I'd like to propose a fairly major refactoring of the server's
basebackup.c. The current code isn't horrific or anything, but the
base backup mechanism has grown quite a few features over the years
and all of the code knows about all of the features. This is going to
make it progressively more difficult to add additional features, and I
have a few in mind that I'd like to add, as discussed below and also
on several other recent threads.[1]/messages/by-id/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com[2]/messages/by-id/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com The attached patch set shows
what I have in mind. It needs more work, but I believe that there's
enough here for someone to review the overall direction, and even some
of the specifics, and hopefully give me some useful feedback.
This patch set is built around the idea of creating two new
abstractions, a base backup sink -- or bbsink -- and a base backup
archiver -- or bbarchiver. Each of these works like a foreign data
wrapper or custom scan or TupleTableSlot. That is, there's a table of
function pointers that act like method callbacks. Every implementation
can allocate a struct of sufficient size for its own bookkeeping data,
and the first member of the struct is always the same, and basically
holds the data that all implementations must store, including a
pointer to the table of function pointers. If we were using C++,
bbarchiver and bbsink would be abstract base classes.
They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.
In the future, we could have other bbarchivers. For example, we could
add "pax", "zip", or "cpio" bbarchiver which produces archives of that
format, and any given backup could choose which one to use. Or, we
could have a bbarchiver that runs each individual file through a
compression algorithm and then forwards the resulting data to a
subsequent bbarchiver. That would make it easy to produce a tarfile of
individually compressed files, which is one possible way of creating a
seekable achive.[3]/messages/by-id/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com and following Likewise, we could have other bbsinks. For
example, we could have a "localdisk" bbsink that cause the server to
write the backup somewhere in the local filesystem instead of
streaming it out over libpq. Or, we could have an "s3" bbsink that
writes the archives to S3. We could also have bbsinks that compresses
the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
and forward the resulting compressed archives to the next bbsink in
the chain. I'm not trying to pass judgement on whether any of these
particular things are things we want to do, nor am I saying that this
patch set solves all the problems with doing them. However, I believe
it will make such things a whole lot easier to implement, because all
of the knowledge about whatever new functionality is being added is
centralized in one place, rather than being spread across the entirety
of basebackup.c. As an example of this, look at how 0010 changes
basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
knows anything that is tar-specific, whereas right now it knows about
tar-specific things in many places.
Here's an overview of this patch set:
0001-0003 are cleanup patches that I have posted for review on
separate threads.[4]/messages/by-id/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com[5]/messages/by-id/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com They are included here to make it easy to
apply this whole series if someone wishes to do so.
0004 is a minor refactoring that reduces by 1 the number of functions
in basebackup.c that know about the specifics of tarfiles. It is just
a preparatory patch and probably not very interesting.
0005 invents the bbsink abstraction.
0006 creates basebackup_libpq.c and moves all code that knows about
the details of sending archives via libpq there. The functionality is
exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.
0007 creates basebackup_throttle.c and moves all code that knows about
throttling backups there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_throttle. This means that
the throttling logic could be reused to throttle output to any final
destination. Essentially, this is a bbsink that just passes everything
it gets through to the next bbsink, but with a rate limit. If
throttling's not enabled, no bbsink_throttle object is created, so all
of the throttling code is completely out of the execution pipeline.
0008 creates basebackup_progress.c and moves all code that knows about
progress reporting there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_progress. Since the
abstraction doesn't fit perfectly in this case, some extra functions
are added to work around the problem. This is not entirely elegant,
but I don't think it's still an improvement over what we have now, and
I don't have a better idea.
0009 invents the bbarchiver abstraction.
0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations. This area needs more work, as the whole
idea of measuring progress by estimating the archive size is going to
break down as soon as server-side compression is in the picture.
Second, this makes the code path that we use for figuring out the
backup size details much more similar to the path we use for
performing the actual backup. For instance, with this patch, we
include the exact same files in the calculation that we will include
in the backup, and in the same order, something that's not true today.
The basebackup_tar.c file added by this patch is sadly lacking in
comments, which I will add in a future version of the patch set. I
think, though, that it will not be too unclear what's going on here.
0011 invents another new kind of bbarchiver. This bbarchiver just
eavesdrops on the stream of files to facilitate backup manifest
construction, and then forwards everything through to a subsequent
bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
used. This patch is a bit clunky at the moment and needs some polish,
but it is another demonstration of how these abstractions can be used
to simplify basebackup.c, so that basebackup.c only has to worry about
determining what should be backed up and not have to worry much about
all the specific things that need to be done as part of that.
Although this patch set adds quite a bit of code on net, it makes
basebackup.c considerably smaller and simpler, removing more than 400
lines of code from that file, about 20% of the current total. There
are some gratifying changes vs. the status quo. For example, in
master, we have this:
sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
Notably, the sizeonly flag makes the function not do what the name of
the function suggests that it does. Also, we've got to pass some extra
fields through to enable specific features. With the patch set, the
equivalent function looks like this:
archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
List *tablespaces, bool sendtblspclinks)
The question "what should I do with the directories and files we find
as we recurse?" is now answered by the choice of which bbarchiver to
pass to the function, rather than by the values of sizeonly, manifest,
and spcoid. That's not night and day, but I think it's better,
especially as you imagine adding more features in the future. The
really important part, for me, is that you can make the bbarchiver do
anything you like without needing to make any more changes to this
function. It just arranges to invoke your callbacks. You take it from
there.
One pretty major question that this patch set doesn't address is what
the user interface for any of the hypothetical features mentioned
above ought to look like, or how basebackup.c ought to support them.
The syntax for the BASE_BACKUP command, like the contents of
basebackup.c, has grown organically, and doesn't seem to be very
scalable. Also, the wire protocol - a series of CopyData results which
the client is entirely responsible for knowing how to interpret and
about which the server provides only minimal information - doesn't
much lend itself to extensibility. Some careful design work is likely
needed in both areas, and this patch does not try to do any of it. I
am quite interested in discussing those questions, but I felt that
they weren't the most important problems to solve first.
What do you all think?
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
[1]: /messages/by-id/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
[2]: /messages/by-id/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com
[3]: /messages/by-id/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com and following
and following
[4]: /messages/by-id/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
[5]: /messages/by-id/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com
So it might be good if I'd remembered to attach the patches. Let's try
that again.
...Robert
Attachments:
v1-0005-Introduce-bbsink-abstraction.patchapplication/octet-stream; name=v1-0005-Introduce-bbsink-abstraction.patchDownload
From bc44ac988e18e9ea8534ca496b0d471b7d9ad09f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:11:33 -0400
Subject: [PATCH v1 05/11] Introduce bbsink abstraction.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup_sink.c | 72 +++++++++
src/include/replication/basebackup_sink.h | 176 ++++++++++++++++++++++
3 files changed, 249 insertions(+)
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..25d56478f4 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_sink.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..bd0298990d
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,72 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+void
+bbsink_forward_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_backup(sink->bbs_next, startptr, starttli, tablespaces);
+}
+
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+void
+bbsink_forward_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_archive_contents(sink->bbs_next, data, len);
+}
+
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+void
+bbsink_forward_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_manifest_contents(sink->bbs_next, data, len);
+}
+
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..050cf1180d
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,176 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * From a logical point of view, a basebackup sink's callbacks are invoked
+ * after the source files read from the data directory have been assembled
+ * into archives (e.g. by creating one tar file per tablespace) but before
+ * those archives are sent to the client. In reality, processing is
+ * interleaved, with archives being generated incrementally and these
+ * callbacks being invoked on the archive fragment as they are generated.
+ * The point, however, is that a basebackup sink shouldn't be trying to
+ * do anything with individual data files, nor should it do anything that
+ * depends on a particular choice of archive format. It should only
+ * perform processing that treats the archives passed to it -- and the
+ * backup manifest -- as opaque blobs of bytes.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_next' is a pointer to another bbarchiver to which this bbarchiver is
+ * forwarding some or all operations.
+ *
+ * If a bbsink needs to store additional state, it can allocate a larger
+ * structure whose first element is a bbsink.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ bbsink *bbs_next;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline functions
+ * rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /* This callback is invoked just once, at the very start of the backup. */
+ void (*begin_backup)(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces);
+
+ /*
+ * For each archive produced by the backup process, there will be one call
+ * to the begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ */
+ void (*begin_archive)(bbsink *sink, const char *archive_name);
+ void (*archive_contents)(bbsink *sink, const char *data, size_t len);
+ void (*end_archive)(bbsink *sink);
+
+ /*
+ * After all archives have been sent, and provided that the caller has
+ * requested a backup manifest, there will be one call to the
+ * begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback.
+ */
+ void (*begin_manifest)(bbsink *sink);
+ void (*manifest_contents)(bbsink *sink, const char *data, size_t len);
+ void (*end_manifest)(bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup)(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, XLogRecPtr startptr, TimeLineID starttli,
+ List *tablespaces)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->begin_backup(sink, startptr, starttli, tablespaces);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->archive_contents(sink, data, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->manifest_contents(sink, data, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli,
+ List *tablespaces);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, const char *data,
+ size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, const char *data,
+ size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+#endif
--
2.24.2 (Apple Git-127)
v1-0002-Minor-code-cleanup-for-perform_base_backup.patchapplication/octet-stream; name=v1-0002-Minor-code-cleanup-for-perform_base_backup.patchDownload
From 97b65d1dab6fe066b26375f249f56d7349afe3b1 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 6 May 2020 17:50:33 -0400
Subject: [PATCH v1 02/11] Minor code cleanup for perform_base_backup().
Merge two calls to sendDir() that are exactly the same except for
the fifth argument. Adjust comments to match.
Also, don't bother checking whether tblspc_map_file is NULL. We
initialize it in all cases, so it can't be.
Patch by me, reviewed by Amit Kapila.
Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
---
src/backend/replication/basebackup.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 98be1b854b..084db4b2e5 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -269,7 +269,7 @@ perform_base_backup(basebackup_options *opt)
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
- StringInfo tblspc_map_file = NULL;
+ StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
List *tablespaces = NIL;
@@ -424,25 +424,23 @@ perform_base_backup(basebackup_options *opt)
if (ti->path == NULL)
{
struct stat statbuf;
+ bool sendtblspclinks = true;
/* In the main tar, include the backup_label first... */
sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
&manifest);
- /*
- * Send tablespace_map file if required and then the bulk of
- * the files.
- */
- if (tblspc_map_file && opt->sendtblspcmapfile)
+ /* Then the tablespace_map file, if required... */
+ if (opt->sendtblspcmapfile)
{
sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
- sendDir(".", 1, false, tablespaces, false,
- &manifest, NULL);
+ sendtblspclinks = false;
}
- else
- sendDir(".", 1, false, tablespaces, true,
- &manifest, NULL);
+
+ /* Then the bulk of the files... */
+ sendDir(".", 1, false, tablespaces, sendtblspclinks,
+ &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
--
2.24.2 (Apple Git-127)
v1-0003-Assorted-cleanup-of-tar-related-code.patchapplication/octet-stream; name=v1-0003-Assorted-cleanup-of-tar-related-code.patchDownload
From 3d7c41530d5c3c81162becf95c12b2babdbec113 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 24 Apr 2020 10:38:10 -0400
Subject: [PATCH v1 03/11] Assorted cleanup of tar-related code.
Introduce TAR_BLOCK_SIZE and replace many instances of 512 with
the new constant. Introduce function tarPaddingBytesRequired
and use it to replace numerous repetitions of (x + 511) & ~511.
Add preprocessor guards against multiple inclusion to pgtar.h.
Reformat the prototype for tarCreateHeader so it doesn't extend
beyond 80 characters.
---
src/backend/replication/basebackup.c | 29 ++++++++++++------
src/bin/pg_basebackup/pg_basebackup.c | 44 +++++++++++++--------------
src/bin/pg_basebackup/walmethods.c | 25 +++++++++------
src/bin/pg_dump/pg_backup_tar.c | 30 +++++++++---------
src/include/pgtar.h | 23 ++++++++++++--
5 files changed, 93 insertions(+), 58 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 084db4b2e5..54a746abde 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -670,7 +670,11 @@ perform_base_backup(basebackup_options *opt)
errmsg("unexpected WAL file size \"%s\"", walFileName)));
}
- /* wal_segment_size is a multiple of 512, so no need for padding */
+ /*
+ * wal_segment_size is a multiple of TAR_BLOCK_SIZE, so no need
+ * for padding.
+ */
+ Assert(wal_segment_size % TAR_BLOCK_SIZE == 0);
FreeFile(fp);
@@ -1123,11 +1127,11 @@ sendFileWithContent(const char *filename, const char *content,
pq_putmessage('d', content, len);
update_basebackup_progress(len);
- /* Pad to 512 byte boundary, per tar format requirements */
- pad = ((len + 511) & ~511) - len;
+ /* Pad to a multiple of the tar block size. */
+ pad = tarPaddingBytesRequired(len);
if (pad > 0)
{
- char buf[512];
+ char buf[TAR_BLOCK_SIZE];
MemSet(buf, 0, pad);
pq_putmessage('d', buf, pad);
@@ -1496,9 +1500,14 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (sent || sizeonly)
{
- /* Add size, rounded up to 512byte block */
- size += ((statbuf.st_size + 511) & ~511);
- size += 512; /* Size of the header of the file */
+ /* Add size. */
+ size += statbuf.st_size;
+
+ /* Pad to a multiple of the tar block size. */
+ size += tarPaddingBytesRequired(statbuf.st_size);
+
+ /* Size of the header for the file. */
+ size += TAR_BLOCK_SIZE;
}
}
else
@@ -1789,11 +1798,11 @@ sendFile(const char *readfilename, const char *tarfilename,
}
/*
- * Pad to 512 byte boundary, per tar format requirements. (This small
+ * Pad to a block boundary, per tar format requirements. (This small
* piece of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = ((len + 511) & ~511) - len;
+ pad = tarPaddingBytesRequired(len);
if (pad > 0)
{
MemSet(buf, 0, pad);
@@ -1827,7 +1836,7 @@ static int64
_tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[512];
+ char h[TAR_BLOCK_SIZE];
enum tarError rc;
if (!sizeonly)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2e9035d613..29407d5644 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -62,7 +62,7 @@ typedef struct WriteTarState
int tablespacenum;
char filename[MAXPGPATH];
FILE *tarfile;
- char tarhdr[512];
+ char tarhdr[TAR_BLOCK_SIZE];
bool basetablespace;
bool in_tarhdr;
bool skip_file;
@@ -1024,7 +1024,7 @@ writeTarData(WriteTarState *state, char *buf, int r)
static void
ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
{
- char zerobuf[1024];
+ char zerobuf[TAR_BLOCK_SIZE * 2];
WriteTarState state;
memset(&state, 0, sizeof(state));
@@ -1168,7 +1168,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
if (state.basetablespace && writerecoveryconf)
{
- char header[512];
+ char header[TAR_BLOCK_SIZE];
/*
* If postgresql.auto.conf has not been found in the streamed data,
@@ -1187,7 +1187,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_file_create_mode, 04000, 02000,
time(NULL));
- padding = ((recoveryconfcontents->len + 511) & ~511) - recoveryconfcontents->len;
+ padding = tarPaddingBytesRequired(recoveryconfcontents->len);
writeTarData(&state, header, sizeof(header));
writeTarData(&state, recoveryconfcontents->data,
@@ -1223,7 +1223,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
*/
if (strcmp(basedir, "-") == 0 && manifest)
{
- char header[512];
+ char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
initPQExpBuffer(&buf);
@@ -1241,7 +1241,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
termPQExpBuffer(&buf);
}
- /* 2 * 512 bytes empty data at end of file */
+ /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
writeTarData(&state, zerobuf, sizeof(zerobuf));
#ifdef HAVE_LIBZ
@@ -1302,9 +1302,9 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
*
* To do this, we have to process the individual files inside the TAR
* stream. The stream consists of a header and zero or more chunks,
- * all 512 bytes long. The stream from the server is broken up into
- * smaller pieces, so we have to track the size of the files to find
- * the next header structure.
+ * each with a length equal to TAR_BLOCK_SIZE. The stream from the
+ * server is broken up into smaller pieces, so we have to track the
+ * size of the files to find the next header structure.
*/
int rr = r;
int pos = 0;
@@ -1317,17 +1317,17 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
* We're currently reading a header structure inside the TAR
* stream, i.e. the file metadata.
*/
- if (state->tarhdrsz < 512)
+ if (state->tarhdrsz < TAR_BLOCK_SIZE)
{
/*
* Copy the header structure into tarhdr in case the
- * header is not aligned to 512 bytes or it's not returned
+ * header is not aligned properly or it's not returned
* in whole by the last PQgetCopyData call.
*/
int hdrleft;
int bytes2copy;
- hdrleft = 512 - state->tarhdrsz;
+ hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
bytes2copy = (rr > hdrleft ? hdrleft : rr);
memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
@@ -1360,14 +1360,14 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
state->filesz = read_tar_number(&state->tarhdr[124], 12);
state->file_padding_len =
- ((state->filesz + 511) & ~511) - state->filesz;
+ tarPaddingBytesRequired(state->filesz);
if (state->is_recovery_guc_supported &&
state->is_postgresql_auto_conf &&
writerecoveryconf)
{
/* replace tar header */
- char header[512];
+ char header[TAR_BLOCK_SIZE];
tarCreateHeader(header, "postgresql.auto.conf", NULL,
state->filesz + recoveryconfcontents->len,
@@ -1387,7 +1387,7 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
* If we're not skipping the file, write the tar
* header unmodified.
*/
- writeTarData(state, state->tarhdr, 512);
+ writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
}
}
@@ -1424,15 +1424,15 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
int padding;
int tailsize;
- tailsize = (512 - state->file_padding_len) + recoveryconfcontents->len;
- padding = ((tailsize + 511) & ~511) - tailsize;
+ tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
+ padding = tarPaddingBytesRequired(tailsize);
writeTarData(state, recoveryconfcontents->data,
recoveryconfcontents->len);
if (padding)
{
- char zerobuf[512];
+ char zerobuf[TAR_BLOCK_SIZE];
MemSet(zerobuf, 0, sizeof(zerobuf));
writeTarData(state, zerobuf, padding);
@@ -1550,12 +1550,12 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
/*
* No current file, so this must be the header for a new file
*/
- if (r != 512)
+ if (r != TAR_BLOCK_SIZE)
{
pg_log_error("invalid tar block header size: %zu", r);
exit(1);
}
- totaldone += 512;
+ totaldone += TAR_BLOCK_SIZE;
state->current_len_left = read_tar_number(©buf[124], 12);
@@ -1565,10 +1565,10 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
#endif
/*
- * All files are padded up to 512 bytes
+ * All files are padded up to a multiple of TAR_BLOCK_SIZE
*/
state->current_padding =
- ((state->current_len_left + 511) & ~511) - state->current_len_left;
+ tarPaddingBytesRequired(state->current_len_left);
/*
* First part of header is zero terminated filename
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index ecff08740c..bd1947d623 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -386,7 +386,7 @@ typedef struct TarMethodFile
{
off_t ofs_start; /* Where does the *header* for this file start */
off_t currpos;
- char header[512];
+ char header[TAR_BLOCK_SIZE];
char *pathname;
size_t pad_to_size;
} TarMethodFile;
@@ -625,7 +625,8 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
if (!tar_data->compression)
{
errno = 0;
- if (write(tar_data->fd, tar_data->currentfile->header, 512) != 512)
+ if (write(tar_data->fd, tar_data->currentfile->header,
+ TAR_BLOCK_SIZE) != TAR_BLOCK_SIZE)
{
save_errno = errno;
pg_free(tar_data->currentfile);
@@ -639,7 +640,8 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
else
{
/* Write header through the zlib APIs but with no compression */
- if (!tar_write_compressed_data(tar_data->currentfile->header, 512, true))
+ if (!tar_write_compressed_data(tar_data->currentfile->header,
+ TAR_BLOCK_SIZE, true))
return NULL;
/* Re-enable compression for the rest of the file */
@@ -665,7 +667,9 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
/* Uncompressed, so pad now */
tar_write_padding_data(tar_data->currentfile, pad_to_size);
/* Seek back to start */
- if (lseek(tar_data->fd, tar_data->currentfile->ofs_start + 512, SEEK_SET) != tar_data->currentfile->ofs_start + 512)
+ if (lseek(tar_data->fd,
+ tar_data->currentfile->ofs_start + TAR_BLOCK_SIZE,
+ SEEK_SET) != tar_data->currentfile->ofs_start + TAR_BLOCK_SIZE)
return NULL;
tar_data->currentfile->currpos = 0;
@@ -778,14 +782,14 @@ tar_close(Walfile f, WalCloseMethod method)
}
/*
- * Get the size of the file, and pad the current data up to the nearest
- * 512 byte boundary.
+ * Get the size of the file, and pad out to a multiple of the tar block
+ * size.
*/
filesize = tar_get_current_pos(f);
- padding = ((filesize + 511) & ~511) - filesize;
+ padding = tarPaddingBytesRequired(filesize);
if (padding)
{
- char zerobuf[512];
+ char zerobuf[TAR_BLOCK_SIZE];
MemSet(zerobuf, 0, padding);
if (tar_write(f, zerobuf, padding) != padding)
@@ -826,7 +830,7 @@ tar_close(Walfile f, WalCloseMethod method)
if (!tar_data->compression)
{
errno = 0;
- if (write(tar_data->fd, tf->header, 512) != 512)
+ if (write(tar_data->fd, tf->header, TAR_BLOCK_SIZE) != TAR_BLOCK_SIZE)
{
/* if write didn't set errno, assume problem is no disk space */
if (errno == 0)
@@ -845,7 +849,8 @@ tar_close(Walfile f, WalCloseMethod method)
}
/* Overwrite the header, assuming the size will be the same */
- if (!tar_write_compressed_data(tar_data->currentfile->header, 512, true))
+ if (!tar_write_compressed_data(tar_data->currentfile->header,
+ TAR_BLOCK_SIZE, true))
return -1;
/* Turn compression back on */
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index d5bfa55646..b4f5942959 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -893,7 +893,7 @@ _CloseArchive(ArchiveHandle *AH)
/*
* EOF marker for tar files is two blocks of NULLs.
*/
- for (i = 0; i < 512 * 2; i++)
+ for (i = 0; i < TAR_BLOCK_SIZE * 2; i++)
{
if (fputc(0, ctx->tarFH) == EOF)
WRITE_ERROR_EXIT;
@@ -1113,7 +1113,7 @@ _tarAddFile(ArchiveHandle *AH, TAR_MEMBER *th)
buf1, buf2);
}
- pad = ((len + 511) & ~511) - len;
+ pad = tarPaddingBytesRequired(len);
for (i = 0; i < pad; i++)
{
if (fputc('\0', th->tarFH) == EOF)
@@ -1130,7 +1130,7 @@ _tarPositionTo(ArchiveHandle *AH, const char *filename)
lclContext *ctx = (lclContext *) AH->formatData;
TAR_MEMBER *th = pg_malloc0(sizeof(TAR_MEMBER));
char c;
- char header[512];
+ char header[TAR_BLOCK_SIZE];
size_t i,
len,
blks;
@@ -1189,17 +1189,19 @@ _tarPositionTo(ArchiveHandle *AH, const char *filename)
th->targetFile, filename);
/* Header doesn't match, so read to next header */
- len = ((th->fileLen + 511) & ~511); /* Padded length */
- blks = len >> 9; /* # of 512 byte blocks */
+ len = th->fileLen;
+ len += tarPaddingBytesRequired(th->fileLen);
+ blks = len / TAR_BLOCK_SIZE; /* # of tar blocks */
for (i = 0; i < blks; i++)
- _tarReadRaw(AH, &header[0], 512, NULL, ctx->tarFH);
+ _tarReadRaw(AH, &header[0], TAR_BLOCK_SIZE, NULL, ctx->tarFH);
if (!_tarGetHeader(AH, th))
fatal("could not find header for file \"%s\" in tar archive", filename);
}
- ctx->tarNextMember = ctx->tarFHpos + ((th->fileLen + 511) & ~511);
+ ctx->tarNextMember = ctx->tarFHpos + th->fileLen
+ + tarPaddingBytesRequired(th->fileLen);
th->pos = 0;
return th;
@@ -1210,7 +1212,7 @@ static int
_tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
{
lclContext *ctx = (lclContext *) AH->formatData;
- char h[512];
+ char h[TAR_BLOCK_SIZE];
char tag[100 + 1];
int sum,
chk;
@@ -1223,12 +1225,12 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
/* Save the pos for reporting purposes */
hPos = ctx->tarFHpos;
- /* Read a 512 byte block, return EOF, exit if short */
- len = _tarReadRaw(AH, h, 512, NULL, ctx->tarFH);
+ /* Read the next tar block, return EOF, exit if short */
+ len = _tarReadRaw(AH, h, TAR_BLOCK_SIZE, NULL, ctx->tarFH);
if (len == 0) /* EOF */
return 0;
- if (len != 512)
+ if (len != TAR_BLOCK_SIZE)
fatal(ngettext("incomplete tar header found (%lu byte)",
"incomplete tar header found (%lu bytes)",
len),
@@ -1248,7 +1250,7 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
{
int i;
- for (i = 0; i < 512; i++)
+ for (i = 0; i < TAR_BLOCK_SIZE; i++)
{
if (h[i] != 0)
{
@@ -1294,12 +1296,12 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
static void
_tarWriteHeader(TAR_MEMBER *th)
{
- char h[512];
+ char h[TAR_BLOCK_SIZE];
tarCreateHeader(h, th->targetFile, NULL, th->fileLen,
0600, 04000, 02000, time(NULL));
/* Now write the completed header. */
- if (fwrite(h, 1, 512, th->tarFH) != 512)
+ if (fwrite(h, 1, TAR_BLOCK_SIZE, th->tarFH) != TAR_BLOCK_SIZE)
WRITE_ERROR_EXIT;
}
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 0a875903a7..0f08dc0c2c 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,6 +11,10 @@
*
*-------------------------------------------------------------------------
*/
+#ifndef PG_TAR_H
+#define PG_TAR_H
+
+#define TAR_BLOCK_SIZE 512
enum tarError
{
@@ -19,8 +23,23 @@ enum tarError
TAR_SYMLINK_TOO_LONG
};
-extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget,
- pgoff_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+extern enum tarError tarCreateHeader(char *h, const char *filename,
+ const char *linktarget, pgoff_t size,
+ mode_t mode, uid_t uid, gid_t gid,
+ time_t mtime);
extern uint64 read_tar_number(const char *s, int len);
extern void print_tar_number(char *s, int len, uint64 val);
extern int tarChecksum(char *header);
+
+/*
+ * Compute the number of padding bytes required for an entry in a tar
+ * archive. We must pad out to a multiple of TAR_BLOCK_SIZE. Since that's
+ * a power of 2, we can use TYPEALIGN().
+ */
+static inline size_t
+tarPaddingBytesRequired(size_t len)
+{
+ return TYPEALIGN(TAR_BLOCK_SIZE, len) - len;
+}
+
+#endif
--
2.24.2 (Apple Git-127)
v1-0004-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchapplication/octet-stream; name=v1-0004-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchDownload
From c806eb27fc1245aa5a10247320caa8614687662e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v1 04/11] Recast _tarWriteDirectory as
convert_link_to_directory.
So that it doesn't get tangled up in tar-specific considerations.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 54a746abde..73f6413faa 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -70,8 +70,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1364,7 +1363,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1380,7 +1381,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1392,7 +1395,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1872,12 +1877,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1886,8 +1890,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.2 (Apple Git-127)
v1-0001-Don-t-export-basebackup.c-s-sendTablespace.patchapplication/octet-stream; name=v1-0001-Don-t-export-basebackup.c-s-sendTablespace.patchDownload
From a730ccb616e5a1f206667744c94dae82233cad33 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 29 Apr 2020 11:57:56 -0400
Subject: [PATCH v1 01/11] Don't export basebackup.c's sendTablespace().
Commit 72d422a5227ef6f76f412486a395aba9f53bf3f0 made xlog.c call
sendTablespace() with the 'sizeonly' argument set to true, which
required basebackup.c to export sendTablespace(). However, that's
kind of ugly, so instead defer the call to sendTablespace() until
basebackup.c regains control. That way, it can still be a static
function.
Patch by me, reviewed by Amit Kapila.
Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
---
src/backend/access/transam/xlog.c | 14 ++------------
src/backend/access/transam/xlogfuncs.c | 4 ++--
src/backend/replication/basebackup.c | 21 ++++++++++++++-------
src/include/access/xlog.h | 2 +-
src/include/replication/basebackup.h | 6 ------
5 files changed, 19 insertions(+), 28 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d3d670928..85eabbceb0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10468,8 +10468,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
XLogRecPtr
do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
StringInfo labelfile, List **tablespaces,
- StringInfo tblspcmapfile, bool infotbssize,
- bool needtblspcmapfile)
+ StringInfo tblspcmapfile, bool needtblspcmapfile)
{
bool exclusive = (labelfile == NULL);
bool backup_started_in_recovery = false;
@@ -10689,14 +10688,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
datadirpathlen = strlen(DataDir);
- /*
- * Report that we are now estimating the total backup size
- * if we're streaming base backup as requested by pg_basebackup
- */
- if (tablespaces)
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
-
/* Collect information about all tablespaces */
tblspcdir = AllocateDir("pg_tblspc");
while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
@@ -10761,8 +10752,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
ti->oid = pstrdup(de->d_name);
ti->path = pstrdup(buflinkpath.data);
ti->rpath = relpath ? pstrdup(relpath) : NULL;
- ti->size = infotbssize ?
- sendTablespace(fullpath, ti->oid, true, NULL) : -1;
+ ti->size = -1;
if (tablespaces)
*tablespaces = lappend(*tablespaces, ti);
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 00e1b33ed5..290658b22c 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -76,7 +76,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
if (exclusive)
{
startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL,
- NULL, NULL, false, true);
+ NULL, NULL, true);
}
else
{
@@ -94,7 +94,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
register_persistent_abort_backup_handler();
startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file,
- NULL, tblspc_map_file, false, true);
+ NULL, tblspc_map_file, true);
}
PG_RETURN_LSN(startpoint);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbdc28ec39..98be1b854b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -58,6 +58,8 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
+static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+ struct backup_manifest_info *manifest);
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
@@ -307,8 +309,7 @@ perform_base_backup(basebackup_options *opt)
PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
labelfile, &tablespaces,
- tblspc_map_file,
- opt->progress, opt->sendtblspcmapfile);
+ tblspc_map_file, opt->sendtblspcmapfile);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -337,10 +338,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
- if (opt->progress)
- ti->size = sendDir(".", 1, true, tablespaces, true, NULL, NULL);
- else
- ti->size = -1;
+ ti->size = -1;
tablespaces = lappend(tablespaces, ti);
/*
@@ -349,10 +347,19 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+
foreach(lc, tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
+ if (tmp->path == NULL)
+ tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
+ NULL);
+ else
+ tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ NULL);
backup_total += tmp->size;
}
}
@@ -1141,7 +1148,7 @@ sendFileWithContent(const char *filename, const char *content,
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
-int64
+static int64
sendTablespace(char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index e917dfe92d..347a38f57c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -372,7 +372,7 @@ typedef enum SessionBackupState
extern XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast,
TimeLineID *starttli_p, StringInfo labelfile,
- List **tablespaces, StringInfo tblspcmapfile, bool infotbssize,
+ List **tablespaces, StringInfo tblspcmapfile,
bool needtblspcmapfile);
extern XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive,
TimeLineID *stoptli_p);
diff --git a/src/include/replication/basebackup.h b/src/include/replication/basebackup.h
index 923a651cac..f5f044dacd 100644
--- a/src/include/replication/basebackup.h
+++ b/src/include/replication/basebackup.h
@@ -14,9 +14,6 @@
#include "nodes/replnodes.h"
-struct backup_manifest_info; /* avoid including backup_manifest.h */
-
-
/*
* Minimum and maximum values of MAX_RATE option in BASE_BACKUP command.
*/
@@ -33,7 +30,4 @@ typedef struct
extern void SendBaseBackup(BaseBackupCmd *cmd);
-extern int64 sendTablespace(char *path, char *oid, bool sizeonly,
- struct backup_manifest_info *manifest);
-
#endif /* _BASEBACKUP_H */
--
2.24.2 (Apple Git-127)
v1-0006-Convert-libpq-related-code-to-a-bbsink.patchapplication/octet-stream; name=v1-0006-Convert-libpq-related-code-to-a-bbsink.patchDownload
From c9f91a3a4b8094d60d43a6f1f74e1d7e0399a809 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 6 May 2020 12:08:21 -0400
Subject: [PATCH v1 06/11] Convert libpq-related code to a bbsink.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/backup_manifest.c | 18 +-
src/backend/replication/basebackup.c | 286 +++++--------------
src/backend/replication/basebackup_libpq.c | 309 +++++++++++++++++++++
src/include/replication/backup_manifest.h | 4 +-
src/include/replication/basebackup_sink.h | 3 +
6 files changed, 388 insertions(+), 233 deletions(-)
create mode 100644 src/backend/replication/basebackup_libpq.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 25d56478f4..6adc396501 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_libpq.o \
basebackup_sink.o \
repl_gram.o \
slot.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index d2f454c60e..61f6f2c12b 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -283,9 +284,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -321,19 +321,15 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file: %m")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
+ * Send the backup manifest.
*
* We choose to read back the data from the temporary file in chunks of
* size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
* size, so it seems to make sense to match that value here.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
char manifestbuf[BLCKSZ];
@@ -347,12 +343,10 @@ SendBackupManifest(backup_manifest_info *manifest)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, manifestbuf, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 73f6413faa..dea547081a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,12 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -30,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -58,24 +56,23 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
static void update_basebackup_progress(int64 delta);
@@ -272,6 +269,7 @@ perform_base_backup(basebackup_options *opt)
backup_manifest_info manifest;
int datadirpathlen;
List *tablespaces = NIL;
+ bbsink *sink = bbsink_libpq_new();
backup_total = 0;
backup_streamed = 0;
@@ -354,10 +352,10 @@ perform_base_backup(basebackup_options *opt)
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
+ tmp->size = sendDir(sink, ".", 1, true, tablespaces, true, NULL,
NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
backup_total += tmp->size;
}
@@ -378,11 +376,8 @@ perform_base_backup(basebackup_options *opt)
pgstat_progress_update_multi_param(3, index, val);
}
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, startptr, starttli, tablespaces);
/* Setup and activate network throttling, if client requested it */
if (opt->maxrate > 0)
@@ -412,33 +407,28 @@ perform_base_backup(basebackup_options *opt)
foreach(lc, tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
+ sendDir(sink, ".", 1, false, tablespaces, sendtblspclinks,
&manifest, NULL);
/* ... and pg_control after everything else. */
@@ -447,24 +437,30 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
Assert(lnext(tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
+ bbsink_end_archive(sink);
tblspc_streamed++;
pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
@@ -639,17 +635,14 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
while ((cnt = fread(buf, 1,
Min(sizeof(buf), wal_segment_size - len),
fp)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
+ bbsink_archive_contents(sink, buf, cnt);
update_basebackup_progress(cnt);
len += cnt;
@@ -684,7 +677,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -707,23 +700,22 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -949,151 +941,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", (uint32) (ptr >> 32), (uint32) ptr);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
@@ -1121,9 +973,8 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
+ bbsink_archive_contents(sink, content, len);
update_basebackup_progress(len);
/* Pad to a multiple of the tar block size. */
@@ -1133,7 +984,7 @@ sendFileWithContent(const char *filename, const char *content,
char buf[TAR_BLOCK_SIZE];
MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
+ bbsink_archive_contents(sink, buf, pad);
update_basebackup_progress(pad);
}
@@ -1150,7 +1001,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1180,11 +1031,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1203,8 +1054,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1364,8 +1215,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1382,8 +1233,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1396,15 +1247,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1435,7 +1286,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1459,7 +1310,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1491,7 +1342,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1499,7 +1350,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1576,7 +1427,7 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
@@ -1609,7 +1460,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1762,10 +1613,7 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
+ bbsink_archive_contents(sink, buf, cnt);
update_basebackup_progress(cnt);
/* Also feed it to the checksum machinery. */
@@ -1794,7 +1642,7 @@ sendFile(const char *readfilename, const char *tarfilename,
while (len < statbuf->st_size)
{
cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
+ bbsink_archive_contents(sink, buf, cnt);
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
update_basebackup_progress(cnt);
len += cnt;
@@ -1811,7 +1659,7 @@ sendFile(const char *readfilename, const char *tarfilename,
if (pad > 0)
{
MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
+ bbsink_archive_contents(sink, buf, pad);
update_basebackup_progress(pad);
}
@@ -1838,7 +1686,7 @@ sendFile(const char *readfilename, const char *tarfilename,
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
char h[TAR_BLOCK_SIZE];
@@ -1869,7 +1717,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
+ bbsink_archive_contents(sink, h, sizeof(h));
update_basebackup_progress(sizeof(h));
}
diff --git a/src/backend/replication/basebackup_libpq.c b/src/backend/replication/basebackup_libpq.c
new file mode 100644
index 0000000000..f0024a881a
--- /dev/null
+++ b/src/backend/replication/basebackup_libpq.c
@@ -0,0 +1,309 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_libpq.c
+ * send archives and backup manifest to client via libpq
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_libpq.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_libpq_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces);
+static void bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_libpq_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_libpq_end_archive(bbsink *sink);
+static void bbsink_libpq_begin_manifest(bbsink *sink);
+static void bbsink_libpq_manifest_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_libpq_end_manifest(bbsink *sink);
+static void bbsink_libpq_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+
+const bbsink_ops bbsink_libpq_ops = {
+ .begin_backup = bbsink_libpq_begin_backup,
+ .begin_archive = bbsink_libpq_begin_archive,
+ .archive_contents = bbsink_libpq_archive_contents,
+ .end_archive = bbsink_libpq_end_archive,
+ .begin_manifest = bbsink_libpq_begin_manifest,
+ .manifest_contents = bbsink_libpq_manifest_contents,
+ .end_manifest = bbsink_libpq_end_manifest,
+ .end_backup = bbsink_libpq_end_backup
+};
+
+/*
+ * Create a new 'libpq' bbsink.
+ */
+bbsink *
+bbsink_libpq_new(void)
+{
+ bbsink *sink = palloc(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_libpq_ops;
+ sink->bbs_next = NULL;
+
+ return sink;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_libpq_begin_backup(bbsink *sink, XLogRecPtr startptr, TimeLineID starttli,
+ List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_libpq_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ SendCopyData(data, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_libpq_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_libpq_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ SendCopyData(data, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_libpq_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_libpq_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", (uint32) (ptr >> 32), (uint32) ptr);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 06b114f3d7..043635b31c 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,6 +47,6 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 050cf1180d..a8df937957 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -173,4 +173,7 @@ extern void bbsink_forward_end_manifest(bbsink *sink);
extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_libpq_new(void);
+
#endif
--
2.24.2 (Apple Git-127)
v1-0009-Introduce-bbarchiver-abstraction.patchapplication/octet-stream; name=v1-0009-Introduce-bbarchiver-abstraction.patchDownload
From 653b2a7d84052d5c60e6bf8527c9c251c873413c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 16:43:16 -0400
Subject: [PATCH v1 09/11] Introduce bbarchiver abstraction.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup_archiver.c | 119 +++++++++++
src/include/replication/basebackup_archiver.h | 195 ++++++++++++++++++
3 files changed, 315 insertions(+)
create mode 100644 src/backend/replication/basebackup_archiver.c
create mode 100644 src/include/replication/basebackup_archiver.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 7de4f82882..aacccd350d 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_archiver.o \
basebackup_libpq.o \
basebackup_progress.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup_archiver.c b/src/backend/replication/basebackup_archiver.c
new file mode 100644
index 0000000000..045a8a088e
--- /dev/null
+++ b/src/backend/replication/basebackup_archiver.c
@@ -0,0 +1,119 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_archiver.c
+ * general supporting code for basebackup archiver implementations
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_archiver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "replication/basebackup_archiver.h"
+
+/* Pass begin_tablespace callback to next bbarchiver. */
+void
+bbarchiver_forward_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_begin_tablespace(archiver->bba_next, tsinfo);
+}
+
+/* Pass end_tablespace callback to next bbarchiver. */
+void
+bbarchiver_forward_end_tablespace(bbarchiver *archiver)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_end_tablespace(archiver->bba_next);
+}
+
+/* Pass begin_file callback to next bbarchiver. */
+void
+bbarchiver_forward_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_begin_file(archiver->bba_next, relative_path, statbuf);
+}
+
+/* Pass file_contents callback to next bbarchiver. */
+void
+bbarchiver_forward_file_contents(bbarchiver *archiver, const char *data, size_t len)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_file_contents(archiver->bba_next, data, len);
+}
+
+/* Pass end_file callback to next bbarchiver. */
+void
+bbarchiver_forward_end_file(bbarchiver *archiver)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_end_file(archiver->bba_next);
+}
+
+/* Pass directory callback to next bbarchiver. */
+void
+bbarchiver_forward_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_directory(archiver->bba_next, relative_path, statbuf);
+}
+
+/* Pass symbolic_link callback to next bbarchiver. */
+void
+bbarchiver_forward_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_symbolic_link(archiver->bba_next, relative_path, linktarget, statbuf);
+}
+
+/* Ignore begin_tablespace callback. */
+void
+bbarchiver_noop_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ /* Do nothing */
+}
+
+/* Ignore end_tablespace callback. */
+void
+bbarchiver_noop_end_tablespace(bbarchiver *archiver)
+{
+ /* Do nothing */
+}
+
+/* Ignore begin_file callback. */
+void
+bbarchiver_noop_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ /* Do nothing */
+}
+
+/* Ignore end_file callback. */
+void
+bbarchiver_noop_end_file(bbarchiver *archiver)
+{
+ /* Do nothing */
+}
+
+/* Ignore directory callback. */
+void
+bbarchiver_noop_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ /* Do nothing */
+}
+
+/* Ignore symbolic_link callback. */
+void
+bbarchiver_noop_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ /* Do nothing */
+}
diff --git a/src/include/replication/basebackup_archiver.h b/src/include/replication/basebackup_archiver.h
new file mode 100644
index 0000000000..fce0afa167
--- /dev/null
+++ b/src/include/replication/basebackup_archiver.h
@@ -0,0 +1,195 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbarchiver.h
+ * iterate over files, directories, and symbolic links encountered as
+ * part of the base backup process
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * src/include/replication/bbarchiver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_ARCHIVER_H
+#define BASEBACKUP_ARCHIVER_H
+
+#include <sys/stat.h>
+
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+struct bbarchiver;
+struct bbarchiver_ops;
+typedef struct bbarchiver bbarchiver;
+typedef struct bbarchiver_ops bbarchiver_ops;
+
+/*
+ * Common data for any type of basebackup archiver.
+ *
+ * 'bba_ops' is the relevant callback table.
+ *
+ * 'bba_next' is a pointer to another bbarchiver to which this bbarchiver is
+ * forwarding some or all operations.
+ *
+ * If a barchiver needs to store additional state, it can allocate a larger
+ * structure whose first element is a bbarchiver.
+ */
+struct bbarchiver
+{
+ const bbarchiver_ops *bba_ops;
+ bbarchiver *bba_next;
+};
+
+/*
+ * Callbacks for a backup archiver.
+ *
+ * Except as otherwise noted, all of these callbacks are required. If a particular
+ * callback just needs to forward the call to archiver->bba_next, use
+ * bbarchiver_forward_<callback_name> as the callback. If a particular (required)
+ * callback doesn't need to do anything at all, use bbarchiver_noop_<callback_name>
+ * as the callback.
+ *
+ * Callers should always invoke these callbacks via the bbarchiver_*
+ * inline functions rather than calling them directly.
+ */
+struct bbarchiver_ops
+{
+ /* These callbacks are invoked just before and after visiting each tablespace. */
+ void (*begin_tablespace)(bbarchiver *archiver, tablespaceinfo *tsinfo);
+ void (*end_tablespace)(bbarchiver *archiver);
+
+ /* This callback is invoked each time we begin visiting a plain file. */
+ void (*begin_file)(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf);
+
+ /*
+ * This callback is invoked one or more times for each plain file, with the
+ * contents of the file passed to it chunk by chunk.
+ *
+ * It is optional. If NULL, the file is not read.
+ */
+ void (*file_contents)(bbarchiver *archiver, const char *data,
+ size_t len);
+
+ /* This callback is invoked each time we finish visiting a plain file. */
+ void (*end_file)(bbarchiver *archiver);
+
+ /* This method gets called each time we visit a directory. */
+ void (*directory)(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf);
+
+ /* This method gets called each time we visit a symbolic link. */
+ void (*symbolic_link)(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf);
+};
+
+/* Dummy callbacks for when a bbarchiver wants to forward operations. */
+extern void bbarchiver_forward_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+extern void bbarchiver_forward_end_tablespace(bbarchiver *archiver);
+extern void bbarchiver_forward_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+extern void bbarchiver_forward_file_contents(bbarchiver *archiver,
+ const char *data, size_t len);
+extern void bbarchiver_forward_end_file(bbarchiver *archiver);
+extern void bbarchiver_forward_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+extern void bbarchiver_forward_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+
+/* Dummy callbacks for when a bbarchiver wants to do nothing. */
+extern void bbarchiver_noop_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+extern void bbarchiver_noop_end_tablespace(bbarchiver *archiver);
+extern void bbarchiver_noop_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+/* if there's nothing to do for file contents, omit callback! */
+extern void bbarchiver_noop_end_file(bbarchiver *archiver);
+extern void bbarchiver_noop_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+extern void bbarchiver_noop_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+
+/* Begin visiting a tablespace. */
+static inline void
+bbarchiver_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ Assert(archiver->bba_ops->begin_tablespace != NULL);
+ archiver->bba_ops->begin_tablespace(archiver, tsinfo);
+}
+
+/* Finish visiting a tablespace. */
+static inline void
+bbarchiver_end_tablespace(bbarchiver *archiver)
+{
+ Assert(archiver->bba_ops->end_tablespace != NULL);
+ archiver->bba_ops->end_tablespace(archiver);
+}
+
+/* Begin visiting a plain file. */
+static inline void
+bbarchiver_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_ops->begin_file != NULL);
+ archiver->bba_ops->begin_file(archiver, relative_path, statbuf);
+}
+
+/* Does this archiver need the contents of the files? */
+static inline bool
+bbarchiver_needs_file_contents(bbarchiver *archiver)
+{
+ return archiver->bba_ops->file_contents != NULL;
+}
+
+/*
+ * Process contents of a plain file.
+ *
+ * Don't call this unless bbarchiver_needs_file_contents returns true.
+ */
+static inline void
+bbarchiver_file_contents(bbarchiver *archiver, const char *data, size_t len)
+{
+ Assert(archiver->bba_ops->file_contents != NULL);
+ archiver->bba_ops->file_contents(archiver, data, len);
+}
+
+/* Finish visiting a plain file. */
+static inline void
+bbarchiver_end_file(bbarchiver *archiver)
+{
+ Assert(archiver->bba_ops->end_file != NULL);
+ archiver->bba_ops->end_file(archiver);
+}
+
+/* Visit a directory. */
+static inline void
+bbarchiver_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_ops->directory != NULL);
+ archiver->bba_ops->directory(archiver, relative_path, statbuf);
+}
+
+/* Visit a symbolic link. */
+static inline void
+bbarchiver_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ Assert(archiver->bba_ops->symbolic_link != NULL);
+ archiver->bba_ops->symbolic_link(archiver, relative_path, linktarget, statbuf);
+}
+
+/* Constructors for various types of archivers. */
+extern bbarchiver *bbarchiver_tar_new(bbsink *sink);
+extern bbarchiver *bbarchiver_tarsize_new(void);
+
+#endif
--
2.24.2 (Apple Git-127)
v1-0008-Convert-progress-reporting-code-to-a-bbsink.patchapplication/octet-stream; name=v1-0008-Convert-progress-reporting-code-to-a-bbsink.patchDownload
From 49239fdc27ac26ef0acd5500c8486f2aeccf1a39 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 7 May 2020 15:18:39 -0400
Subject: [PATCH v1 08/11] Convert progress-reporting code to a bbsink.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 104 +------
src/backend/replication/basebackup_progress.c | 287 ++++++++++++++++++
src/include/replication/basebackup_sink.h | 8 +
4 files changed, 304 insertions(+), 96 deletions(-)
create mode 100644 src/backend/replication/basebackup_progress.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 58b6c228bb..7de4f82882 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_libpq.o \
+ basebackup_progress.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6fe0da2f49..1655806f1f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -18,7 +18,6 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "common/file_perm.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
@@ -74,7 +73,6 @@ static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
/* Was the backup currently in-progress initiated in recovery mode? */
@@ -109,15 +107,6 @@ static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -252,26 +241,14 @@ perform_base_backup(basebackup_options *opt)
int datadirpathlen;
List *tablespaces = NIL;
bbsink *sink = bbsink_libpq_new();
+ bbsink *progress_sink;
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
-
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -288,8 +265,7 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+ basebackup_progress_wait_checkpoint();
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
labelfile, &tablespaces,
tblspc_map_file, opt->sendtblspcmapfile);
@@ -305,7 +281,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -330,8 +305,7 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
foreach(lc, tablespaces)
{
@@ -343,25 +317,9 @@ perform_base_backup(basebackup_options *opt)
else
tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
}
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
/* notify basebackup sink about start of backup */
bbsink_begin_backup(sink, startptr, starttli, tablespaces);
@@ -423,14 +381,9 @@ perform_base_backup(basebackup_options *opt)
}
else
bbsink_end_archive(sink);
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -456,8 +409,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -605,7 +557,6 @@ perform_base_backup(basebackup_options *opt)
{
CheckXLogRemoved(segno, tli);
bbsink_archive_contents(sink, buf, cnt);
- update_basebackup_progress(cnt);
len += cnt;
@@ -692,7 +643,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -936,7 +887,6 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
_tarWriteHeader(sink, filename, NULL, &statbuf, false);
bbsink_archive_contents(sink, content, len);
- update_basebackup_progress(len);
/* Pad to a multiple of the tar block size. */
pad = tarPaddingBytesRequired(len);
@@ -946,7 +896,6 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
MemSet(buf, 0, pad);
bbsink_archive_contents(sink, buf, pad);
- update_basebackup_progress(pad);
}
pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
@@ -1575,7 +1524,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
}
bbsink_archive_contents(sink, buf, cnt);
- update_basebackup_progress(cnt);
/* Also feed it to the checksum machinery. */
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
@@ -1604,7 +1552,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
cnt = Min(sizeof(buf), statbuf->st_size - len);
bbsink_archive_contents(sink, buf, cnt);
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
- update_basebackup_progress(cnt);
len += cnt;
}
}
@@ -1619,7 +1566,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
{
MemSet(buf, 0, pad);
bbsink_archive_contents(sink, buf, pad);
- update_basebackup_progress(pad);
}
FreeFile(fp);
@@ -1677,7 +1623,6 @@ _tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
}
bbsink_archive_contents(sink, h, sizeof(h));
- update_basebackup_progress(sizeof(h));
}
return sizeof(h);
@@ -1698,36 +1643,3 @@ convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
-
-/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
- */
-static void
-update_basebackup_progress(int64 delta)
-{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
-}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..1dcb9d8390
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,287 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress reporting. Data is forwarded to
+ * the next base backup sink in the chain and the number of bytes
+ * forwarded is used to update shared memory progress counters.
+ *
+ * Progress reporting requires extra callbacks that most base backup sinks
+ * don't. Rather than cramming those into the interface, we just have a few
+ * extra functions here that basebackup.c can call. (We could put the logic
+ * directly into that file as it's fairly simple, but it seems cleaner to
+ * have it all in one place.)
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_progress
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Are we estimating the backup size? */
+ bool estimate_backup_size;
+
+ /*
+ * Estimated total amount of backup data that will be streamed.
+ * -1 means that the size is not estimated.
+ */
+ int64 backup_total;
+
+ /* Amount of backup data already streamed */
+ int64 backup_streamed;
+
+ /* Total number of tablespaces. */
+ int tblspc_total;
+
+ /* Number of those that have been streamed. */
+ int tblspc_streamed;
+} bbsink_progress;
+
+static void bbsink_progress_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli,
+ List *tablespaces);
+static void bbsink_progress_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress reporting and forwards
+ * data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink_progress *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_progress));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_progress_ops;
+ sink->base.bbs_next = next;
+
+ sink->estimate_backup_size = estimate_backup_size;
+ sink->backup_total = -1;
+ sink->backup_streamed = 0;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of
+ * the backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ sink->backup_total);
+
+ return &sink->base;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ /* Save count of tablespaces. */
+ mysink->tblspc_total = list_length(tablespaces);
+
+ /*
+ * If the sizes of the individual tablespaces are being calculated, add
+ * them up to get a total size.
+ */
+ if (mysink->estimate_backup_size)
+ {
+ ListCell *lc;
+
+ mysink->backup_total = 0;
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ mysink->backup_total += ti->size;
+ }
+ }
+
+ /*
+ * Report that we are now streaming database files as a base backup.
+ * Also advertise the number of tablespaces, and, if known, the estimated
+ * total backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ val[1] = mysink->backup_total;
+ val[2] = mysink->tblspc_total;
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_backup(sink->bbs_next, startptr, starttli, tablespaces);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ /*
+ * We assume that the end of an archive means we've reached the end of a
+ * tablespace. That's not ideal: we might want to decouple those two
+ * concepts better.
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (mysink->tblspc_streamed < mysink->tblspc_total)
+ {
+ mysink->tblspc_streamed++;
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ mysink->tblspc_streamed);
+ }
+}
+
+/*
+ * First pass archive contents to next sink, and then perform progress updates.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ /* First forward to next sink. */
+ Assert(sink->bbs_next != NULL);
+ bbsink_archive_contents(sink->bbs_next, data, len);
+
+ /* Now increment count of what was sent by length of data. */
+ mysink->backup_streamed += len;
+ val[nparam++] = mysink->backup_streamed;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (mysink->backup_total > -1 &&
+ mysink->backup_streamed > mysink->backup_total)
+ {
+ mysink->backup_total = mysink->backup_streamed;
+ val[nparam++] = mysink->backup_total;
+ }
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ Assert(mysink->tblspc_streamed >= mysink->tblspc_total - 1);
+ Assert(mysink->tblspc_streamed <= mysink->tblspc_total);
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+
+ /*
+ * We report having finished all tablespaces at this point, even if
+ * the archive for the main tablespace is still open, because what's
+ * going to be added is WAL files, not files that are really from the
+ * main tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = mysink->tblspc_total = mysink->tblspc_streamed;
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index bc1710e2eb..bf2d71fafa 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -175,6 +175,14 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_libpq_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
#endif
--
2.24.2 (Apple Git-127)
v1-0010-Create-and-use-bbarchiver-implementations-for-tar.patchapplication/octet-stream; name=v1-0010-Create-and-use-bbarchiver-implementations-for-tar.patchDownload
From 84ea8b5d51b7dcac17b250c0deaeb0d652471f46 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 7 May 2020 18:10:30 -0400
Subject: [PATCH v1 10/11] Create and use bbarchiver implementations for tar
and tar sizing.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 428 ++++++++++-------------
src/backend/replication/basebackup_tar.c | 266 ++++++++++++++
3 files changed, 454 insertions(+), 241 deletions(-)
create mode 100644 src/backend/replication/basebackup_tar.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index aacccd350d..6b3c77f2c0 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -21,6 +21,7 @@ OBJS = \
basebackup_libpq.o \
basebackup_progress.o \
basebackup_sink.o \
+ basebackup_tar.o \
basebackup_throttle.o \
repl_gram.o \
slot.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1655806f1f..c606e7cf58 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -26,7 +26,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
-#include "replication/basebackup_sink.h"
+#include "replication/basebackup_archiver.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -55,20 +55,26 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
- struct backup_manifest_info *manifest);
-static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
- List *tablespaces, bool sendtblspclinks,
- backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
- struct stat *statbuf, bool missing_ok, Oid dboid,
- backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(bbsink *sink, const char *filename,
- const char *content,
- backup_manifest_info *manifest);
-static int64 _tarWriteHeader(bbsink *sink, const char *filename,
- const char *linktarget, struct stat *statbuf,
- bool sizeonly);
+static void archive_database_cluster(List *tablespaces, bbarchiver *archiver,
+ StringInfo labelfile,
+ StringInfo tblspc_map_file,
+ bool leave_main_tablespace_open,
+ backup_manifest_info *manifest);
+static void archive_tablespace(bbarchiver *archiver, char *path, char *oid,
+ struct backup_manifest_info *manifest);
+static void archive_directory(bbarchiver *archiver, const char *path,
+ int basepathlen, List *tablespaces,
+ bool sendtblspclinks,
+ backup_manifest_info *manifest,
+ const char *spcoid);
+static void archive_file(bbarchiver *archiver, const char *readfilename,
+ const char *tarfilename, struct stat *statbuf,
+ bool missing_ok, Oid dboid,
+ backup_manifest_info *manifest, const char *spcoid);
+static void archive_file_with_content(bbarchiver *archiver,
+ const char *filename,
+ const char *content,
+ backup_manifest_info *manifest);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -242,6 +248,8 @@ perform_base_backup(basebackup_options *opt)
List *tablespaces = NIL;
bbsink *sink = bbsink_libpq_new();
bbsink *progress_sink;
+ bbarchiver *archiver;
+ bbarchiver *size_archiver;
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
@@ -250,6 +258,10 @@ perform_base_backup(basebackup_options *opt)
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
+ /* Set up tar archiving. */
+ archiver = bbarchiver_tar_new(sink);
+ size_archiver = bbarchiver_tarsize_new();
+
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
@@ -269,6 +281,8 @@ perform_base_backup(basebackup_options *opt)
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
labelfile, &tablespaces,
tblspc_map_file, opt->sendtblspcmapfile);
+ if (!opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -279,7 +293,6 @@ perform_base_backup(basebackup_options *opt)
PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
{
- ListCell *lc;
tablespaceinfo *ti;
/*
@@ -299,89 +312,26 @@ perform_base_backup(basebackup_options *opt)
ti->size = -1;
tablespaces = lappend(tablespaces, ti);
- /*
- * Calculate the total backup size by summing up the size of each
- * tablespace
- */
+ /* estimate sizes of all tablespaces, if PROGRESS option was given */
if (opt->progress)
{
basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
- {
- tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
-
- if (tmp->path == NULL)
- tmp->size = sendDir(sink, ".", 1, true, tablespaces, true, NULL,
- NULL);
- else
- tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
- NULL);
- }
+ archive_database_cluster(tablespaces, size_archiver, labelfile,
+ tblspc_map_file, false, NULL);
}
/* notify basebackup sink about start of backup */
bbsink_begin_backup(sink, startptr, starttli, tablespaces);
- /* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
-
- if (ti->path == NULL)
- {
- struct stat statbuf;
- bool sendtblspclinks = true;
-
- bbsink_begin_archive(sink, "base.tar");
-
- /* In the main tar, include the backup_label first... */
- sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
- &manifest);
-
- /* Then the tablespace_map file, if required... */
- if (opt->sendtblspcmapfile)
- {
- sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
- &manifest);
- sendtblspclinks = false;
- }
-
- /* Then the bulk of the files... */
- sendDir(sink, ".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
-
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- XLOG_CONTROL_FILE)));
- sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
- false, InvalidOid, &manifest, NULL);
- }
- else
- {
- char *archive_name = psprintf("%s.tar", ti->oid);
-
- bbsink_begin_archive(sink, archive_name);
-
- sendTablespace(sink, ti->path, ti->oid, false, &manifest);
- }
-
- /*
- * If we're including WAL, and this is the main data directory we
- * don't treat this as the end of the tablespace. Instead, we will
- * include the xlog files below and stop afterwards. This is safe
- * since the main data directory is always sent *last*.
- */
- if (opt->includewal && ti->path == NULL)
- {
- Assert(lnext(tablespaces, lc) == NULL);
- }
- else
- bbsink_end_archive(sink);
- }
+ /*
+ * Back up all of the tablespaces.
+ *
+ * If the backup is to include WAL, leave the main tablespace open,
+ * so that we can archive the WAL files as well.
+ */
+ archive_database_cluster(tablespaces, archiver, labelfile,
+ tblspc_map_file, opt->includewal, &manifest);
basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
@@ -549,14 +499,13 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
-
+ bbarchiver_begin_file(archiver, pathbuf, &statbuf);
while ((cnt = fread(buf, 1,
Min(sizeof(buf), wal_segment_size - len),
fp)) > 0)
{
CheckXLogRemoved(segno, tli);
- bbsink_archive_contents(sink, buf, cnt);
+ bbarchiver_file_contents(archiver, buf, cnt);
len += cnt;
@@ -574,11 +523,7 @@ perform_base_backup(basebackup_options *opt)
errmsg("unexpected WAL file size \"%s\"", walFileName)));
}
- /*
- * wal_segment_size is a multiple of TAR_BLOCK_SIZE, so no need
- * for padding.
- */
- Assert(wal_segment_size % TAR_BLOCK_SIZE == 0);
+ bbarchiver_end_file(archiver);
FreeFile(fp);
@@ -589,7 +534,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(sink, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "", &manifest);
}
/*
@@ -612,12 +557,12 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
- &manifest, NULL);
+ archive_file(archiver, pathbuf, pathbuf, &statbuf, false,
+ InvalidOid, &manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(sink, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "", &manifest);
}
bbsink_end_archive(sink);
@@ -646,6 +591,74 @@ perform_base_backup(basebackup_options *opt)
basebackup_progress_done();
}
+/*
+ * Iterate over the entire cluster and feed each tablespace to the archiver
+ * in turn.
+ */
+static void
+archive_database_cluster(List *tablespaces, bbarchiver *archiver,
+ StringInfo labelfile, StringInfo tblspc_map_file,
+ bool leave_main_tablespace_open,
+ backup_manifest_info *manifest)
+{
+ ListCell *lc;
+
+ /* Send off our tablespaces one by one */
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ bbarchiver_begin_tablespace(archiver, ti);
+
+ if (ti->path == NULL)
+ {
+ struct stat statbuf;
+ bool sendtblspclinks = true;
+
+ /* For the main tablespace, archive the backup_label first... */
+ archive_file_with_content(archiver, BACKUP_LABEL_FILE,
+ labelfile->data, manifest);
+
+ /* Then the tablespace_map file, if present... */
+ if (tblspc_map_file != NULL)
+ {
+ archive_file_with_content(archiver, TABLESPACE_MAP,
+ tblspc_map_file->data,
+ manifest);
+ sendtblspclinks = false;
+ }
+
+ /* Then the bulk of the files... */
+ archive_directory(archiver, ".", 1, tablespaces,
+ sendtblspclinks, manifest, NULL);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ archive_file(archiver, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE,
+ &statbuf, false, InvalidOid, manifest, NULL);
+ }
+ else
+ archive_tablespace(archiver, ti->path, ti->oid, manifest);
+
+ /*
+ * If we were asked to leave the main tablespace open, then do so.
+ * This is safe since the main data directory is always sent *last*,
+ * so we'll not try to begin another tablespace without ending this
+ * one.
+ */
+ if (leave_main_tablespace_open && ti->path == NULL)
+ {
+ Assert(lnext(tablespaces, lc) == NULL);
+ }
+ else
+ bbarchiver_end_tablespace(archiver);
+ }
+}
+
/*
* list_sort comparison function, to compare log/seg portion of WAL segment
* filenames, ignoring the timeline portion.
@@ -854,18 +867,20 @@ SendBaseBackup(BaseBackupCmd *cmd)
}
/*
- * Inject a file with given name and content in the output tar stream.
+ * Feed a file to the archiver that does not actually exist in the source
+ * directory. We use this to inject things like the backup_label file into
+ * the backup.
*/
static void
-sendFileWithContent(bbsink *sink, const char *filename, const char *content,
- backup_manifest_info *manifest)
+archive_file_with_content(bbarchiver *archiver, const char *filename,
+ const char *content, backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
- len;
+ int len;
pg_checksum_context checksum_ctx;
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
+ if (manifest != NULL)
+ pg_checksum_init(&checksum_ctx, manifest->checksum_type);
len = strlen(content);
@@ -885,22 +900,17 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- bbsink_archive_contents(sink, content, len);
+ bbarchiver_begin_file(archiver, filename, &statbuf);
+ if (bbarchiver_needs_file_contents(archiver))
+ bbarchiver_file_contents(archiver, content, len);
+ bbarchiver_end_file(archiver);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (manifest != NULL)
{
- char buf[TAR_BLOCK_SIZE];
-
- MemSet(buf, 0, pad);
- bbsink_archive_contents(sink, buf, pad);
+ pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
+ AddFileToBackupManifest(manifest, NULL, filename, len,
+ (pg_time_t) statbuf.st_mtime, &checksum_ctx);
}
-
- pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
- AddFileToBackupManifest(manifest, NULL, filename, len,
- (pg_time_t) statbuf.st_mtime, &checksum_ctx);
}
/*
@@ -910,11 +920,10 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
-static int64
-sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
- backup_manifest_info *manifest)
+static void
+archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
+ backup_manifest_info *manifest)
{
- int64 size;
char pathbuf[MAXPGPATH];
struct stat statbuf;
@@ -938,17 +947,14 @@ sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
pathbuf)));
/* If the tablespace went away while scanning, it's no error. */
- return 0;
+ return;
}
- size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ bbarchiver_directory(archiver, TABLESPACE_VERSION_DIRECTORY, &statbuf);
/* Send all the files in the tablespace version directory */
- size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
- spcoid);
-
- return size;
+ archive_directory(archiver, pathbuf, strlen(path), NIL, true, manifest,
+ spcoid);
}
/*
@@ -963,16 +969,15 @@ sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
* information in the tar file. If not, we can skip that
* as it will be sent separately in the tablespace_map file.
*/
-static int64
-sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
- List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
- const char *spcoid)
+static void
+archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
+ List *tablespaces, bool sendtblspclinks,
+ backup_manifest_info *manifest, const char *spcoid)
{
DIR *dir;
struct dirent *de;
char pathbuf[MAXPGPATH * 2];
struct stat statbuf;
- int64 size = 0;
const char *lastDir; /* Split last dir from parent path. */
bool isDbDir = false; /* Does this directory contain relations? */
@@ -1125,8 +1130,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
excludeFound = true;
break;
}
@@ -1143,8 +1148,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
continue;
}
@@ -1157,15 +1162,15 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, "./pg_wal/archive_status",
+ &statbuf);
continue; /* don't recurse into pg_wal */
}
@@ -1196,8 +1201,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ bbarchiver_symbolic_link(archiver, pathbuf + basepathlen + 1,
+ linkpath, &statbuf);
#else
/*
@@ -1220,8 +1225,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1252,36 +1257,21 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
- sendtblspclinks, manifest, spcoid);
+ archive_directory(archiver, pathbuf, basepathlen, tablespaces,
+ sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
{
- bool sent = false;
-
- if (!sizeonly)
- sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
- true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
- manifest, spcoid);
-
- if (sent || sizeonly)
- {
- /* Add size. */
- size += statbuf.st_size;
-
- /* Pad to a multiple of the tar block size. */
- size += tarPaddingBytesRequired(statbuf.st_size);
-
- /* Size of the header for the file. */
- size += TAR_BLOCK_SIZE;
- }
+ archive_file(archiver, pathbuf, pathbuf + basepathlen + 1,
+ &statbuf, true,
+ isDbDir ? atooid(lastDir + 1) : InvalidOid,
+ manifest, spcoid);
}
else
ereport(WARNING,
(errmsg("skipping special file \"%s\"", pathbuf)));
}
FreeDir(dir);
- return size;
}
/*
@@ -1332,14 +1322,11 @@ is_checksummed_file(const char *fullpath, const char *filename)
*
* If dboid is anything other than InvalidOid then any checksum failures detected
* will get reported to the stats collector.
- *
- * Returns true if the file was successfully sent, false if 'missing_ok',
- * and the file did not exist.
*/
-static bool
-sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
- struct stat *statbuf, bool missing_ok, Oid dboid,
- backup_manifest_info *manifest, const char *spcoid)
+static void
+archive_file(bbarchiver *archiver, const char *readfilename,
+ const char *tarfilename, struct stat *statbuf, bool missing_ok,
+ Oid dboid, backup_manifest_info *manifest, const char *spcoid)
{
FILE *fp;
BlockNumber blkno = 0;
@@ -1351,27 +1338,33 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
bool verify_checksum = false;
pg_checksum_context checksum_ctx;
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
+ if (manifest != NULL)
+ pg_checksum_init(&checksum_ctx, manifest->checksum_type);
+
+ bbarchiver_begin_file(archiver, tarfilename, statbuf);
+
+ if (!bbarchiver_needs_file_contents(archiver))
+ {
+ bbarchiver_end_file(archiver);
+ return;
+ }
fp = AllocateFile(readfilename, "rb");
if (fp == NULL)
{
if (errno == ENOENT && missing_ok)
- return false;
+ return;
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
-
if (!noverify_checksums && DataChecksumsEnabled())
{
char *filename;
@@ -1523,10 +1516,11 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
}
}
- bbsink_archive_contents(sink, buf, cnt);
+ bbarchiver_file_contents(archiver, buf, cnt);
/* Also feed it to the checksum machinery. */
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
+ if (manifest != NULL)
+ pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
@@ -1550,23 +1544,14 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
while (len < statbuf->st_size)
{
cnt = Min(sizeof(buf), statbuf->st_size - len);
- bbsink_archive_contents(sink, buf, cnt);
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
+ bbarchiver_file_contents(archiver, buf, cnt);
+ if (manifest != NULL)
+ pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
}
}
- /*
- * Pad to a block boundary, per tar format requirements. (This small
- * piece of data is probably not worth throttling, and is not checksummed
- * because it's not actually part of the file.)
- */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- bbsink_archive_contents(sink, buf, pad);
- }
+ bbarchiver_end_file(archiver);
FreeFile(fp);
@@ -1583,54 +1568,15 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
total_checksum_failures += checksum_failures;
- AddFileToBackupManifest(manifest, spcoid, tarfilename, statbuf->st_size,
- (pg_time_t) statbuf->st_mtime, &checksum_ctx);
-
- return true;
-}
-
-
-static int64
-_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
-{
- char h[TAR_BLOCK_SIZE];
- enum tarError rc;
-
- if (!sizeonly)
- {
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
- statbuf->st_mtime);
-
- switch (rc)
- {
- case TAR_OK:
- break;
- case TAR_NAME_TOO_LONG:
- ereport(ERROR,
- (errmsg("file name too long for tar format: \"%s\"",
- filename)));
- break;
- case TAR_SYMLINK_TOO_LONG:
- ereport(ERROR,
- (errmsg("symbolic link target too long for tar format: "
- "file name \"%s\", target \"%s\"",
- filename, linktarget)));
- break;
- default:
- elog(ERROR, "unrecognized tar error: %d", rc);
- }
-
- bbsink_archive_contents(sink, h, sizeof(h));
- }
-
- return sizeof(h);
+ if (manifest != NULL)
+ AddFileToBackupManifest(manifest, spcoid, tarfilename,
+ statbuf->st_size,
+ (pg_time_t) statbuf->st_mtime, &checksum_ctx);
}
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
+ * directory.
*/
static void
convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
diff --git a/src/backend/replication/basebackup_tar.c b/src/backend/replication/basebackup_tar.c
new file mode 100644
index 0000000000..8618f2a0ec
--- /dev/null
+++ b/src/backend/replication/basebackup_tar.c
@@ -0,0 +1,266 @@
+#include "postgres.h"
+
+#include "pgtar.h"
+#include "replication/basebackup_archiver.h"
+
+typedef struct bbarchiver_tar
+{
+ bbarchiver base;
+ bbsink *sink;
+ size_t file_len;
+} bbarchiver_tar;
+
+typedef struct bbarchiver_tarsize
+{
+ bbarchiver base;
+ tablespaceinfo *tsinfo;
+} bbarchiver_tarsize;
+
+static void bbarchiver_tar_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+static void bbarchiver_tar_end_tablespace(bbarchiver *archiver);
+static void bbarchiver_tar_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tar_file_contents(bbarchiver *archiver,
+ const char *data,
+ size_t len);
+static void bbarchiver_tar_end_file(bbarchiver *archiver);
+static void bbarchiver_tar_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tar_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+static void report_tar_error(enum tarError rc, const char *filename,
+ const char *linktarget);
+
+static void bbarchiver_tarsize_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+static void bbarchiver_tarsize_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tarsize_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tarsize_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+static void add_tar_size(bbarchiver *archiver, uint64 file_size);
+
+const bbarchiver_ops bbarchiver_tar_ops = {
+ .begin_tablespace = bbarchiver_tar_begin_tablespace,
+ .end_tablespace = bbarchiver_tar_end_tablespace,
+ .begin_file = bbarchiver_tar_begin_file,
+ .file_contents = bbarchiver_tar_file_contents,
+ .end_file = bbarchiver_tar_end_file,
+ .directory = bbarchiver_tar_directory,
+ .symbolic_link = bbarchiver_tar_symbolic_link,
+};
+
+const bbarchiver_ops bbarchiver_tarsize_ops = {
+ .begin_tablespace = bbarchiver_tarsize_begin_tablespace,
+ .end_tablespace = bbarchiver_noop_end_tablespace,
+ .begin_file = bbarchiver_tarsize_begin_file,
+ .file_contents = NULL,
+ .end_file = bbarchiver_noop_end_file,
+ .directory = bbarchiver_tarsize_directory,
+ .symbolic_link = bbarchiver_tarsize_symbolic_link,
+};
+
+bbarchiver *
+bbarchiver_tar_new(bbsink *sink)
+{
+ bbarchiver_tar *archiver = palloc0(sizeof(bbarchiver_tar));
+
+ *((const bbarchiver_ops **) &archiver->base.bba_ops) = &bbarchiver_tar_ops;
+ archiver->base.bba_next = NULL;
+ archiver->sink = sink;
+
+ return &archiver->base;
+}
+
+static void
+bbarchiver_tar_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char *archive_name = "base.tar";
+
+ if (tsinfo->path != NULL)
+ archive_name = psprintf("%s.tar", tsinfo->oid);
+
+ bbsink_begin_archive(myarchiver->sink, archive_name);
+}
+
+static void
+bbarchiver_tar_end_tablespace(bbarchiver *archiver)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+
+ bbsink_end_archive(myarchiver->sink);
+}
+
+static void
+bbarchiver_tar_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char h[TAR_BLOCK_SIZE];
+ enum tarError rc;
+
+ myarchiver->file_len = 0;
+
+ rc = tarCreateHeader(h, relative_path, NULL, statbuf->st_size,
+ statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ statbuf->st_mtime);
+ if (rc != TAR_OK)
+ report_tar_error(rc, relative_path, NULL);
+
+ bbsink_archive_contents(myarchiver->sink, h, sizeof(h));
+}
+
+static void
+bbarchiver_tar_file_contents(bbarchiver *archiver, const char *data,
+ size_t len)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+
+ myarchiver->file_len += len;
+ bbsink_archive_contents(myarchiver->sink, data, len);
+}
+
+static void
+bbarchiver_tar_end_file(bbarchiver *archiver)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ int pad;
+
+ /* Pad to a block boundary, per tar format requirements. */
+ pad = tarPaddingBytesRequired(myarchiver->file_len);
+ if (pad > 0)
+ {
+ char buf[TAR_BLOCK_SIZE];
+
+ MemSet(buf, 0, pad);
+ bbsink_archive_contents(myarchiver->sink, buf, pad);
+ }
+}
+
+static void
+bbarchiver_tar_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char h[TAR_BLOCK_SIZE];
+ enum tarError rc;
+
+ rc = tarCreateHeader(h, relative_path, NULL, statbuf->st_size,
+ statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ statbuf->st_mtime);
+ if (rc != TAR_OK)
+ report_tar_error(rc, relative_path, NULL);
+
+ bbsink_archive_contents(myarchiver->sink, h, sizeof(h));
+}
+
+static void
+bbarchiver_tar_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char h[TAR_BLOCK_SIZE];
+ enum tarError rc;
+
+ rc = tarCreateHeader(h, relative_path, linktarget, statbuf->st_size,
+ statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ statbuf->st_mtime);
+ if (rc != TAR_OK)
+ report_tar_error(rc, relative_path, NULL);
+
+ bbsink_archive_contents(myarchiver->sink, h, sizeof(h));
+}
+
+static void
+report_tar_error(enum tarError rc, const char *filename,
+ const char *linktarget)
+{
+ switch (rc)
+ {
+ case TAR_OK:
+ break;
+ case TAR_NAME_TOO_LONG:
+ ereport(ERROR,
+ (errmsg("file name too long for tar format: \"%s\"",
+ filename)));
+ break;
+ case TAR_SYMLINK_TOO_LONG:
+ Assert(linktarget != NULL);
+ ereport(ERROR,
+ (errmsg("symbolic link target too long for tar format: "
+ "file name \"%s\", target \"%s\"",
+ filename, linktarget)));
+ break;
+ default:
+ elog(ERROR, "unrecognized tar error: %d", rc);
+ }
+}
+
+/*
+ * Create an archiver that calculates an estimated size for a tar file built
+ * from the files visited.
+ */
+bbarchiver *
+bbarchiver_tarsize_new(void)
+{
+ bbarchiver_tarsize *archiver = palloc0(sizeof(bbarchiver_tarsize));
+
+ *((const bbarchiver_ops **) &archiver->base.bba_ops) =
+ &bbarchiver_tarsize_ops;
+ archiver->base.bba_next = NULL;
+
+ return &archiver->base;
+}
+
+static void
+bbarchiver_tarsize_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo)
+{
+ bbarchiver_tarsize *myarchiver = (bbarchiver_tarsize *) archiver;
+
+ myarchiver->tsinfo = tsinfo;
+ tsinfo->size = 0;
+}
+
+static void
+bbarchiver_tarsize_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ add_tar_size(archiver, statbuf->st_size);
+}
+
+static void
+bbarchiver_tarsize_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ add_tar_size(archiver, 0);
+}
+
+static void
+bbarchiver_tarsize_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf)
+{
+ add_tar_size(archiver, 0);
+}
+
+static void
+add_tar_size(bbarchiver *archiver, uint64 file_size)
+{
+ bbarchiver_tarsize *myarchiver = (bbarchiver_tarsize *) archiver;
+
+ myarchiver->tsinfo->size +=
+ TAR_BLOCK_SIZE + file_size + tarPaddingBytesRequired(file_size);
+}
--
2.24.2 (Apple Git-127)
v1-0007-Convert-throttling-related-code-to-a-bbsink.patchapplication/octet-stream; name=v1-0007-Convert-throttling-related-code-to-a-bbsink.patchDownload
From 34f7bd2818173e103ef27d77eeeee830e4dc11e8 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 7 May 2020 12:22:17 -0400
Subject: [PATCH v1 07/11] Convert throttling-related code to a bbsink.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 123 +---------
src/backend/replication/basebackup_throttle.c | 211 ++++++++++++++++++
src/include/replication/basebackup_sink.h | 1 +
4 files changed, 217 insertions(+), 119 deletions(-)
create mode 100644 src/backend/replication/basebackup_throttle.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 6adc396501..58b6c228bb 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_libpq.o \
basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dea547081a..6fe0da2f49 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -74,7 +74,6 @@ static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
@@ -89,11 +88,6 @@ static char *statrelpath = NULL;
*/
#define TAR_SEND_SIZE 32768
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
/*
* Checks whether we encountered any error in fread(). fread() doesn't give
* any clue what has happened, so we check with ferror(). Also, neither
@@ -106,18 +100,6 @@ do { \
(errmsg("could not read from file \"%s\"", filename))); \
} while (0)
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
/* The starting XLOG position of the base backup. */
static XLogRecPtr startptr;
@@ -271,6 +253,10 @@ perform_base_backup(basebackup_options *opt)
List *tablespaces = NIL;
bbsink *sink = bbsink_libpq_new();
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
backup_total = 0;
backup_streamed = 0;
pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
@@ -379,30 +365,6 @@ perform_base_backup(basebackup_options *opt)
/* notify basebackup sink about start of backup */
bbsink_begin_backup(sink, startptr, starttli, tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
-
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
{
@@ -646,7 +608,6 @@ perform_base_backup(basebackup_options *opt)
update_basebackup_progress(cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -1620,7 +1581,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
- throttle(cnt);
if (feof(fp) || len >= statbuf->st_size)
{
@@ -1646,7 +1606,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
update_basebackup_progress(cnt);
len += cnt;
- throttle(cnt);
}
}
@@ -1740,80 +1699,6 @@ convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
-/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
- */
-static void
-throttle(size_t increment)
-{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
-
- /*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
- */
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
-
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
-
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
- }
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
-}
-
/*
* Increment the counter for the amount of data already streamed
* by the given number of bytes, and update the progress report for
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..0e3b4542bd
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,211 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink,
+ XLogRecPtr startptr,
+ TimeLineID starttli,
+ List *tablespaces);
+static void bbsink_throttle_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink,
+ const char *data, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_backup(sink->bbs_next, startptr, starttli, tablespaces);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ throttle(mysink, len);
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_archive_contents(sink->bbs_next, data, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ throttle(mysink, len);
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_manifest_contents(sink->bbs_next, data, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a8df937957..bc1710e2eb 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -175,5 +175,6 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_libpq_new(void);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
#endif
--
2.24.2 (Apple Git-127)
v1-0011-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchapplication/octet-stream; name=v1-0011-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchDownload
From 1ccafb998fd54365cbe4de0acffd0d75878b1280 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 8 May 2020 15:21:01 -0400
Subject: [PATCH v1 11/11] WIP: Convert backup manifest generation to a
bbarchiver.
---
src/backend/replication/backup_manifest.c | 106 +++++++++++++++++++-
src/backend/replication/basebackup.c | 113 +++++++++-------------
src/include/replication/backup_manifest.h | 5 +-
3 files changed, 153 insertions(+), 71 deletions(-)
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 61f6f2c12b..bd9a5e4204 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,7 +17,6 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
-#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -371,3 +370,108 @@ AppendStringToManifest(backup_manifest_info *manifest, char *s)
errmsg("could not write to temporary file: %m")));
manifest->manifest_size += len;
}
+
+typedef struct bbarchiver_manifest
+{
+ bbarchiver base;
+
+ /* These things are fixed at creation time. */
+ backup_manifest_info *manifest;
+
+ /* This changes for each tablespace. */
+ const char *spcoid;
+
+ /* These change for each file. */
+ const char *file_pathname;
+ size_t file_size;
+ pg_time_t file_mtime;
+ pg_checksum_context file_checksum_ctx;
+} bbarchiver_manifest;
+
+static void bbarchiver_manifest_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+static void bbarchiver_manifest_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_manifest_file_contents(bbarchiver *archiver,
+ const char *data,
+ size_t len);
+static void bbarchiver_manifest_end_file(bbarchiver *archiver);
+
+static const bbarchiver_ops bbarchiver_manifest_ops = {
+ .begin_tablespace = bbarchiver_manifest_begin_tablespace,
+ .end_tablespace = bbarchiver_forward_end_tablespace,
+ .begin_file = bbarchiver_manifest_begin_file,
+ .file_contents = bbarchiver_manifest_file_contents,
+ .end_file = bbarchiver_manifest_end_file,
+ .directory = bbarchiver_forward_directory,
+ .symbolic_link = bbarchiver_forward_symbolic_link
+};
+
+extern bbarchiver *
+bbarchiver_manifest_new(bbarchiver *next, backup_manifest_info *manifest)
+{
+ bbarchiver_manifest *archiver = palloc0(sizeof(bbarchiver_manifest));
+
+ *((const bbarchiver_ops **) &archiver->base.bba_ops) =
+ &bbarchiver_manifest_ops;
+ archiver->base.bba_next = next;
+ archiver->manifest = manifest;
+
+ return &archiver->base;
+}
+
+static void
+bbarchiver_manifest_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ myarchiver->spcoid = tsinfo->oid;
+
+ bbarchiver_begin_tablespace(archiver->bba_next, tsinfo);
+}
+
+static void
+bbarchiver_manifest_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ myarchiver->file_pathname = relative_path;
+ myarchiver->file_size = statbuf->st_size;
+ myarchiver->file_mtime = statbuf->st_mtime;
+
+ pg_checksum_init(&myarchiver->file_checksum_ctx,
+ myarchiver->manifest->checksum_type);
+
+ bbarchiver_begin_file(archiver->bba_next, relative_path, statbuf);
+}
+
+static void
+bbarchiver_manifest_file_contents(bbarchiver *archiver,
+ const char *data, size_t len)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ pg_checksum_update(&myarchiver->file_checksum_ctx,
+ (uint8 *) data, len);
+
+ bbarchiver_file_contents(archiver->bba_next, data, len);
+}
+
+static void
+bbarchiver_manifest_end_file(bbarchiver *archiver)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ AddFileToBackupManifest(myarchiver->manifest,
+ myarchiver->spcoid,
+ myarchiver->file_pathname,
+ myarchiver->file_size,
+ myarchiver->file_mtime,
+ &myarchiver->file_checksum_ctx);
+
+ bbarchiver_end_file(archiver->bba_next);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c606e7cf58..0e0b043b13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -58,23 +58,17 @@ typedef struct
static void archive_database_cluster(List *tablespaces, bbarchiver *archiver,
StringInfo labelfile,
StringInfo tblspc_map_file,
- bool leave_main_tablespace_open,
- backup_manifest_info *manifest);
-static void archive_tablespace(bbarchiver *archiver, char *path, char *oid,
- struct backup_manifest_info *manifest);
+ bool leave_main_tablespace_open);
+static void archive_tablespace(bbarchiver *archiver, char *path);
static void archive_directory(bbarchiver *archiver, const char *path,
int basepathlen, List *tablespaces,
- bool sendtblspclinks,
- backup_manifest_info *manifest,
- const char *spcoid);
+ bool sendtblspclinks);
static void archive_file(bbarchiver *archiver, const char *readfilename,
const char *tarfilename, struct stat *statbuf,
- bool missing_ok, Oid dboid,
- backup_manifest_info *manifest, const char *spcoid);
+ bool missing_ok, Oid dboid);
static void archive_file_with_content(bbarchiver *archiver,
const char *filename,
- const char *content,
- backup_manifest_info *manifest);
+ const char *content);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -251,6 +245,14 @@ perform_base_backup(basebackup_options *opt)
bbarchiver *archiver;
bbarchiver *size_archiver;
+ /*
+ * The backup manifest code uses a BufFile, so create a ResourceOwner.
+ * This is cheap enough that we don't worry about doing it only if it's
+ * needed, and there might be other uses for it in the future.
+ */
+ Assert(CurrentResourceOwner == NULL);
+ CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -262,9 +264,13 @@ perform_base_backup(basebackup_options *opt)
archiver = bbarchiver_tar_new(sink);
size_archiver = bbarchiver_tarsize_new();
- /* we're going to use a BufFile, so we need a ResourceOwner */
- Assert(CurrentResourceOwner == NULL);
- CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
+ /* Set up backup manifest gneration, if enabled. */
+ if (opt->manifest != MANIFEST_OPTION_NO)
+ {
+ InitializeBackupManifest(&manifest, opt->manifest,
+ opt->manifest_checksum_type);
+ archiver = bbarchiver_manifest_new(archiver, &manifest);
+ }
datadirpathlen = strlen(DataDir);
@@ -272,8 +278,6 @@ perform_base_backup(basebackup_options *opt)
labelfile = makeStringInfo();
tblspc_map_file = makeStringInfo();
- InitializeBackupManifest(&manifest, opt->manifest,
- opt->manifest_checksum_type);
total_checksum_failures = 0;
@@ -318,7 +322,7 @@ perform_base_backup(basebackup_options *opt)
basebackup_progress_estimate_backup_size();
archive_database_cluster(tablespaces, size_archiver, labelfile,
- tblspc_map_file, false, NULL);
+ tblspc_map_file, false);
}
/* notify basebackup sink about start of backup */
@@ -331,7 +335,7 @@ perform_base_backup(basebackup_options *opt)
* so that we can archive the WAL files as well.
*/
archive_database_cluster(tablespaces, archiver, labelfile,
- tblspc_map_file, opt->includewal, &manifest);
+ tblspc_map_file, opt->includewal);
basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
@@ -534,7 +538,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- archive_file_with_content(archiver, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "");
}
/*
@@ -558,19 +562,23 @@ perform_base_backup(basebackup_options *opt)
errmsg("could not stat file \"%s\": %m", pathbuf)));
archive_file(archiver, pathbuf, pathbuf, &statbuf, false,
- InvalidOid, &manifest, NULL);
+ InvalidOid);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- archive_file_with_content(archiver, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "");
}
bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
- SendBackupManifest(&manifest, sink);
+ if (opt->manifest != MANIFEST_OPTION_NO)
+ {
+ AddWALInfoToBackupManifest(&manifest, startptr, starttli,
+ endptr, endtli);
+ SendBackupManifest(&manifest, sink);
+ }
bbsink_end_backup(sink, endptr, endtli);
@@ -598,8 +606,7 @@ perform_base_backup(basebackup_options *opt)
static void
archive_database_cluster(List *tablespaces, bbarchiver *archiver,
StringInfo labelfile, StringInfo tblspc_map_file,
- bool leave_main_tablespace_open,
- backup_manifest_info *manifest)
+ bool leave_main_tablespace_open)
{
ListCell *lc;
@@ -617,20 +624,18 @@ archive_database_cluster(List *tablespaces, bbarchiver *archiver,
/* For the main tablespace, archive the backup_label first... */
archive_file_with_content(archiver, BACKUP_LABEL_FILE,
- labelfile->data, manifest);
+ labelfile->data);
/* Then the tablespace_map file, if present... */
if (tblspc_map_file != NULL)
{
archive_file_with_content(archiver, TABLESPACE_MAP,
- tblspc_map_file->data,
- manifest);
+ tblspc_map_file->data);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- archive_directory(archiver, ".", 1, tablespaces,
- sendtblspclinks, manifest, NULL);
+ archive_directory(archiver, ".", 1, tablespaces, sendtblspclinks);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -639,10 +644,10 @@ archive_database_cluster(List *tablespaces, bbarchiver *archiver,
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
archive_file(archiver, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE,
- &statbuf, false, InvalidOid, manifest, NULL);
+ &statbuf, false, InvalidOid);
}
else
- archive_tablespace(archiver, ti->path, ti->oid, manifest);
+ archive_tablespace(archiver, ti->path);
/*
* If we were asked to leave the main tablespace open, then do so.
@@ -873,14 +878,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
*/
static void
archive_file_with_content(bbarchiver *archiver, const char *filename,
- const char *content, backup_manifest_info *manifest)
+ const char *content)
{
struct stat statbuf;
int len;
- pg_checksum_context checksum_ctx;
-
- if (manifest != NULL)
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
len = strlen(content);
@@ -904,13 +905,6 @@ archive_file_with_content(bbarchiver *archiver, const char *filename,
if (bbarchiver_needs_file_contents(archiver))
bbarchiver_file_contents(archiver, content, len);
bbarchiver_end_file(archiver);
-
- if (manifest != NULL)
- {
- pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
- AddFileToBackupManifest(manifest, NULL, filename, len,
- (pg_time_t) statbuf.st_mtime, &checksum_ctx);
- }
}
/*
@@ -921,8 +915,7 @@ archive_file_with_content(bbarchiver *archiver, const char *filename,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static void
-archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
- backup_manifest_info *manifest)
+archive_tablespace(bbarchiver *archiver, char *path)
{
char pathbuf[MAXPGPATH];
struct stat statbuf;
@@ -953,8 +946,7 @@ archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
bbarchiver_directory(archiver, TABLESPACE_VERSION_DIRECTORY, &statbuf);
/* Send all the files in the tablespace version directory */
- archive_directory(archiver, pathbuf, strlen(path), NIL, true, manifest,
- spcoid);
+ archive_directory(archiver, pathbuf, strlen(path), NIL, true);
}
/*
@@ -971,8 +963,7 @@ archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
*/
static void
archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
- List *tablespaces, bool sendtblspclinks,
- backup_manifest_info *manifest, const char *spcoid)
+ List *tablespaces, bool sendtblspclinks)
{
DIR *dir;
struct dirent *de;
@@ -1258,14 +1249,13 @@ archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
if (!skip_this_dir)
archive_directory(archiver, pathbuf, basepathlen, tablespaces,
- sendtblspclinks, manifest, spcoid);
+ sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
archive_file(archiver, pathbuf, pathbuf + basepathlen + 1,
&statbuf, true,
- isDbDir ? atooid(lastDir + 1) : InvalidOid,
- manifest, spcoid);
+ isDbDir ? atooid(lastDir + 1) : InvalidOid);
}
else
ereport(WARNING,
@@ -1326,7 +1316,7 @@ is_checksummed_file(const char *fullpath, const char *filename)
static void
archive_file(bbarchiver *archiver, const char *readfilename,
const char *tarfilename, struct stat *statbuf, bool missing_ok,
- Oid dboid, backup_manifest_info *manifest, const char *spcoid)
+ Oid dboid)
{
FILE *fp;
BlockNumber blkno = 0;
@@ -1342,10 +1332,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
int segmentno = 0;
char *segmentpath;
bool verify_checksum = false;
- pg_checksum_context checksum_ctx;
-
- if (manifest != NULL)
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
bbarchiver_begin_file(archiver, tarfilename, statbuf);
@@ -1518,10 +1504,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
bbarchiver_file_contents(archiver, buf, cnt);
- /* Also feed it to the checksum machinery. */
- if (manifest != NULL)
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
-
len += cnt;
if (feof(fp) || len >= statbuf->st_size)
@@ -1545,8 +1527,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
{
cnt = Min(sizeof(buf), statbuf->st_size - len);
bbarchiver_file_contents(archiver, buf, cnt);
- if (manifest != NULL)
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
}
}
@@ -1567,11 +1547,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
}
total_checksum_failures += checksum_failures;
-
- if (manifest != NULL)
- AddFileToBackupManifest(manifest, spcoid, tarfilename,
- statbuf->st_size,
- (pg_time_t) statbuf->st_mtime, &checksum_ctx);
}
/*
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 043635b31c..e9211d04bf 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -14,7 +14,7 @@
#include "common/checksum_helper.h"
#include "pgtime.h"
-#include "replication/basebackup_sink.h"
+#include "replication/basebackup_archiver.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -49,4 +49,7 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
TimeLineID endtli);
extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
+extern bbarchiver *bbarchiver_manifest_new(bbarchiver *next,
+ backup_manifest_info *manifest);
+
#endif
--
2.24.2 (Apple Git-127)
Hi,
On 2020-05-08 16:53:09 -0400, Robert Haas wrote:
They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.
Hm.
I wonder if there's cases where recursively forwarding like this will
cause noticable performance effects. The only operation that seems
frequent enough to potentially be noticable would be "chunks" of the
file. So perhaps it'd be good to make sure we read in large enough
chunks?
0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations.
ISTM that it's not actually good to have the progress calculations
include the tar overhead. As you say:
This area needs more work, as the whole idea of measuring progress by
estimating the archive size is going to break down as soon as
server-side compression is in the picture.
This, to me, indicates that we should measure the progress solely based
on how much of the "source" data was processed. The overhead of tar, the
reduction due to compression, shouldn't be included.
What do you all think?
I've not though enough about the specifics, but I think it looks like
it's going roughly in a better direction.
One thing I wonder about is how stateful the interface is. Archivers
will pretty much always track which file is currently open etc. Somehow
such a repeating state machine seems a bit ugly - but I don't really
have a better answer.
Greetings,
Andres Freund
On Fri, May 8, 2020 at 5:27 PM Andres Freund <andres@anarazel.de> wrote:
I wonder if there's cases where recursively forwarding like this will
cause noticable performance effects. The only operation that seems
frequent enough to potentially be noticable would be "chunks" of the
file. So perhaps it'd be good to make sure we read in large enough
chunks?
Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.
This, to me, indicates that we should measure the progress solely based
on how much of the "source" data was processed. The overhead of tar, the
reduction due to compression, shouldn't be included.
I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.
There are some other problems with the current wire protocol, too:
1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.
2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.
I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:
t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.
I've not though enough about the specifics, but I think it looks like
it's going roughly in a better direction.
Good to hear.
One thing I wonder about is how stateful the interface is. Archivers
will pretty much always track which file is currently open etc. Somehow
such a repeating state machine seems a bit ugly - but I don't really
have a better answer.
I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Robert,
Please see my comments inline below.
On Tue, May 12, 2020 at 12:33 AM Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.
One needs a balance between the number of packets to be sent across the
network and so if the size
of reading from the disk and the network packet size could be unified then
it might provide a better optimization.
I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.
I agree here too, except that if we have too many tar files one might
cringe
but sending the xtra amt from these tar files looks okay to me.
There are some other problems with the current wire protocol, too:
1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.One thing to remember here could be that an optimization would need to
be made between the number of options
we provide and people coming back and saying which combinations do not
work
For example, if a user script has "put all the tablespaces in a single
archive" and later on somebody makes some
script changes to break it down at "256 GB" and there is a conflict then
which one takes precedence needs to be chosen.
When the number of options like this become very large this could lead to
some complications.
I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.
Here I support this.
I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.
--Sumanta Mukherjee
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Show quoted text
On Sat, May 9, 2020 at 2:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
Hi,
I'd like to propose a fairly major refactoring of the server's
basebackup.c. The current code isn't horrific or anything, but the
base backup mechanism has grown quite a few features over the years
and all of the code knows about all of the features. This is going to
make it progressively more difficult to add additional features, and I
have a few in mind that I'd like to add, as discussed below and also
on several other recent threads.[1][2] The attached patch set shows
what I have in mind. It needs more work, but I believe that there's
enough here for someone to review the overall direction, and even some
of the specifics, and hopefully give me some useful feedback.This patch set is built around the idea of creating two new
abstractions, a base backup sink -- or bbsink -- and a base backup
archiver -- or bbarchiver. Each of these works like a foreign data
wrapper or custom scan or TupleTableSlot. That is, there's a table of
function pointers that act like method callbacks. Every implementation
can allocate a struct of sufficient size for its own bookkeeping data,
and the first member of the struct is always the same, and basically
holds the data that all implementations must store, including a
pointer to the table of function pointers. If we were using C++,
bbarchiver and bbsink would be abstract base classes.They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.In the future, we could have other bbarchivers. For example, we could
add "pax", "zip", or "cpio" bbarchiver which produces archives of that
format, and any given backup could choose which one to use. Or, we
could have a bbarchiver that runs each individual file through a
compression algorithm and then forwards the resulting data to a
subsequent bbarchiver. That would make it easy to produce a tarfile of
individually compressed files, which is one possible way of creating a
seekable achive.[3] Likewise, we could have other bbsinks. For
example, we could have a "localdisk" bbsink that cause the server to
write the backup somewhere in the local filesystem instead of
streaming it out over libpq. Or, we could have an "s3" bbsink that
writes the archives to S3. We could also have bbsinks that compresses
the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
and forward the resulting compressed archives to the next bbsink in
the chain. I'm not trying to pass judgement on whether any of these
particular things are things we want to do, nor am I saying that this
patch set solves all the problems with doing them. However, I believe
it will make such things a whole lot easier to implement, because all
of the knowledge about whatever new functionality is being added is
centralized in one place, rather than being spread across the entirety
of basebackup.c. As an example of this, look at how 0010 changes
basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
knows anything that is tar-specific, whereas right now it knows about
tar-specific things in many places.Here's an overview of this patch set:
0001-0003 are cleanup patches that I have posted for review on
separate threads.[4][5] They are included here to make it easy to
apply this whole series if someone wishes to do so.0004 is a minor refactoring that reduces by 1 the number of functions
in basebackup.c that know about the specifics of tarfiles. It is just
a preparatory patch and probably not very interesting.0005 invents the bbsink abstraction.
0006 creates basebackup_libpq.c and moves all code that knows about
the details of sending archives via libpq there. The functionality is
exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.0007 creates basebackup_throttle.c and moves all code that knows about
throttling backups there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_throttle. This means that
the throttling logic could be reused to throttle output to any final
destination. Essentially, this is a bbsink that just passes everything
it gets through to the next bbsink, but with a rate limit. If
throttling's not enabled, no bbsink_throttle object is created, so all
of the throttling code is completely out of the execution pipeline.0008 creates basebackup_progress.c and moves all code that knows about
progress reporting there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_progress. Since the
abstraction doesn't fit perfectly in this case, some extra functions
are added to work around the problem. This is not entirely elegant,
but I don't think it's still an improvement over what we have now, and
I don't have a better idea.0009 invents the bbarchiver abstraction.
0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations. This area needs more work, as the whole
idea of measuring progress by estimating the archive size is going to
break down as soon as server-side compression is in the picture.
Second, this makes the code path that we use for figuring out the
backup size details much more similar to the path we use for
performing the actual backup. For instance, with this patch, we
include the exact same files in the calculation that we will include
in the backup, and in the same order, something that's not true today.
The basebackup_tar.c file added by this patch is sadly lacking in
comments, which I will add in a future version of the patch set. I
think, though, that it will not be too unclear what's going on here.0011 invents another new kind of bbarchiver. This bbarchiver just
eavesdrops on the stream of files to facilitate backup manifest
construction, and then forwards everything through to a subsequent
bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
used. This patch is a bit clunky at the moment and needs some polish,
but it is another demonstration of how these abstractions can be used
to simplify basebackup.c, so that basebackup.c only has to worry about
determining what should be backed up and not have to worry much about
all the specific things that need to be done as part of that.Although this patch set adds quite a bit of code on net, it makes
basebackup.c considerably smaller and simpler, removing more than 400
lines of code from that file, about 20% of the current total. There
are some gratifying changes vs. the status quo. For example, in
master, we have this:sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)Notably, the sizeonly flag makes the function not do what the name of
the function suggests that it does. Also, we've got to pass some extra
fields through to enable specific features. With the patch set, the
equivalent function looks like this:archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
List *tablespaces, bool sendtblspclinks)The question "what should I do with the directories and files we find
as we recurse?" is now answered by the choice of which bbarchiver to
pass to the function, rather than by the values of sizeonly, manifest,
and spcoid. That's not night and day, but I think it's better,
especially as you imagine adding more features in the future. The
really important part, for me, is that you can make the bbarchiver do
anything you like without needing to make any more changes to this
function. It just arranges to invoke your callbacks. You take it from
there.One pretty major question that this patch set doesn't address is what
the user interface for any of the hypothetical features mentioned
above ought to look like, or how basebackup.c ought to support them.
The syntax for the BASE_BACKUP command, like the contents of
basebackup.c, has grown organically, and doesn't seem to be very
scalable. Also, the wire protocol - a series of CopyData results which
the client is entirely responsible for knowing how to interpret and
about which the server provides only minimal information - doesn't
much lend itself to extensibility. Some careful design work is likely
needed in both areas, and this patch does not try to do any of it. I
am quite interested in discussing those questions, but I felt that
they weren't the most important problems to solve first.What do you all think?
The overall idea looks quite nice. I had a look at some of the
patches at least 0005 and 0006. At first look, I have one comment.
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, May 12, 2020 at 4:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.
I think all the helper functions have more than one caller, though.
That's why I created them - to avoid duplicating code.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 13, 2020 at 1:56 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, May 12, 2020 at 4:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.I think all the helper functions have more than one caller, though.
That's why I created them - to avoid duplicating code.
You are right, somehow I missed that part. Sorry for the noise.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hi,
Did some performance testing by varying TAR_SEND_SIZE with Robert's
refactor patch and without the patch to check the impact.
Below are the details:
*Backup type*: local backup using pg_basebackup
*Data size*: Around 200GB (200 tables - each table around 1.05 GB)
*different TAR_SEND_SIZE values*: 8kb, 32kb (default value), 128kB, 1MB (
1024kB)
*Server details:*
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4
8kb 32kb (default value) 128kB 1024kB
Without refactor patch real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch) real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 18m17.230s
user 1m42.749s
sys 9m53.704s
The above numbers are taken from the minimum of two runs of each scenario.
I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
good performance whereas, for 1Mb it is taking 2.5x more time.
Please let me know your thoughts/suggestions on the same.
--
--
Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
Hi Suraj,
Two points I wanted to mention.
1. The max rate at which the transfer is happening when the tar size is
128 Kb is at most .48 GB/sec. Is there a possibility to understand what is
the buffer size which is being used. That could help us explain some part
of the puzzle.
2. Secondly the idea of taking just the min of two runs is a bit counter
to the following. How do we justify the performance numbers and attribute
that the differences is not related to noise. It might be better to do a
few experiments for each of the kind and then try and fit a basic linear
model and report the std deviation. "Order statistics" where you get the
min(X1, X2, ... , Xn) is generally a biased estimator. A variance
calculation of the biased statistics is a bit tricky and so the results
could be corrupted by noise.
With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com
On Wed, May 13, 2020 at 9:31 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:
Show quoted text
Hi,
Did some performance testing by varying TAR_SEND_SIZE with Robert's
refactor patch and without the patch to check the impact.Below are the details:
*Backup type*: local backup using pg_basebackup
*Data size*: Around 200GB (200 tables - each table around 1.05 GB)
*different TAR_SEND_SIZE values*: 8kb, 32kb (default value), 128kB, 1MB (
1024kB)*Server details:*
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext48kb 32kb (default value) 128kB 1024kB
Without refactor patch real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch) real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 18m17.230s
user 1m42.749s
sys 9m53.704sThe above numbers are taken from the minimum of two runs of each scenario.
I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
good performance whereas, for 1Mb it is taking 2.5x more time.Please let me know your thoughts/suggestions on the same.
--
--Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
On Wed, May 13, 2020 at 12:01 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:
8kb 32kb (default value) 128kB 1024kB
Without refactor patch real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch) real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 18m17.230s
user 1m42.749s
sys 9m53.704sThe above numbers are taken from the minimum of two runs of each scenario.
I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
good performance whereas, for 1Mb it is taking 2.5x more time.Please let me know your thoughts/suggestions on the same.
So the patch came out slightly faster at 8kB and slightly slower in the
other tests. That's kinda strange. I wonder if it's just noise. How much do
the results vary run to run?
I would've expected (and I think Andres thought the same) that a smaller
block size would be bad for the patch and a larger block size would be
good, but that's not what these numbers show.
I wouldn't worry too much about the regression at 1MB. Probably what's
happening there is that we're losing some concurrency - perhaps with
smaller block sizes the OS can buffer the entire chunk in the pipe
connecting pg_basebackup to the server and start on the next one, but when
you go up to 1MB it doesn't fit and ends up spending a lot of time waiting
for data to be read from the pipe. Wait event profiling might tell you
more. Probably what this suggests is that you want the largest buffer size
that doesn't cause you to overrun the network/pipe buffer and no larger.
Unfortunately, I have no idea how we'd figure that out dynamically, and I
don't see a reason to believe that everyone will have the same size buffers.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On Wed, May 13, 2020 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
So the patch came out slightly faster at 8kB and slightly slower in the
other tests. That's kinda strange. I wonder if it's just noise. How much do
the results vary run to run?
It is not varying much except for 8kB run. Please see below details for
both runs of each scenario.
8kb 32kb (default value) 128kB 1024kB
WIthout refactor
patch 1st run real 10m50.924s
user 1m29.774s
sys 9m13.058s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 7m8.690s
user 0m54.840s
sys 6m1.725s real 18m16.898s
user 1m39.105s
sys 9m42.803s
2nd run real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m44.455s
user 1m7.896s
sys 7m28.909s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
WIth refactor
patch 1st run real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 19m5.218s
user 1m44.122s
sys 10m17.623s
2nd run real 11m30.500s
user 1m45.221s
sys 9m37.815s real 9m4.103s
user 1m6.893s
sys 7m49.393s real 7m26.713s
user 0m54.868s
sys 6m19.652s real 18m17.230s
user 1m42.749s
sys 9m53.704s
--
--
Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
Hi,
I have repeated the experiment with 8K block size and found that the
results are not varying much after applying the patch.
Please find the details below.
*Backup type*: local backup using pg_basebackup
*Data size*: Around 200GB (200 tables - each table around 1.05 GB)
*TAR_SEND_SIZE value*: 8kb
*Server details:*
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4
*Results:*
Iteration WIthout refactor
patch WIth refactor
patch
1st run real 10m19.001s
user 1m37.895s
sys 8m33.008s real 9m45.291s
user 1m23.192s
sys 8m14.993s
2nd run real 9m33.970s
user 1m19.490s
sys 8m6.062s real 9m30.560s
user 1m22.124s
sys 8m0.979s
3rd run real 9m19.327s
user 1m21.772s
sys 7m50.613s real 8m59.241s
user 1m19.001s
sys 7m32.645s
4th run real 9m56.873s
user 1m22.370s
sys 8m27.054s real 9m52.290s
user 1m22.175s
sys 8m23.052s
5th run real 9m45.343s
user 1m23.113s
sys 8m15.418s real 9m49.633s
user 1m23.122s
sys 8m19.240s
Later I connected with Suraj to validate the experiment details and found
that the setup and steps followed are exactly the same in this
experiment when compared with the previous experiment.
Thanks,
Dipesh
On Thu, May 14, 2020 at 7:50 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:
Show quoted text
Hi,
On Wed, May 13, 2020 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
So the patch came out slightly faster at 8kB and slightly slower in the
other tests. That's kinda strange. I wonder if it's just noise. How much do
the results vary run to run?It is not varying much except for 8kB run. Please see below details for
both runs of each scenario.8kb 32kb (default value) 128kB 1024kB
WIthout refactor
patch 1st run real 10m50.924s
user 1m29.774s
sys 9m13.058s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 7m8.690s
user 0m54.840s
sys 6m1.725s real 18m16.898s
user 1m39.105s
sys 9m42.803s
2nd run real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m44.455s
user 1m7.896s
sys 7m28.909s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
WIth refactor
patch 1st run real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 19m5.218s
user 1m44.122s
sys 10m17.623s
2nd run real 11m30.500s
user 1m45.221s
sys 9m37.815s real 9m4.103s
user 1m6.893s
sys 7m49.393s real 7m26.713s
user 0m54.868s
sys 6m19.652s real 18m17.230s
user 1m42.749s
sys 9m53.704s--
--Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
On Tue, Jun 30, 2020 at 10:45 AM Dipesh Pandit <dipesh.pandit@gmail.com>
wrote:
Hi,
I have repeated the experiment with 8K block size and found that the
results are not varying much after applying the patch.
Please find the details below.Later I connected with Suraj to validate the experiment details and found
that the setup and steps followed are exactly the same in this
experiment when compared with the previous experiment.
Thanks Dipesh.
It looks like the results are not varying much with your run as you
followed the same steps.
One of my run with 8kb which took more time than others might be because of
noise at that time.
--
--
Thanks & Regards,
Suraj kharage,
edbpostgres.com
On Fri, May 8, 2020 at 4:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
So it might be good if I'd remembered to attach the patches. Let's try
that again.
Here's an updated patch set. This is now rebased over master and
includes as 0001 the patch I posted separately at
/messages/by-id/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
but drops some other patches that were committed meanwhile. 0002-0009
of this series are basically the same as 0004-0011 from the previous
series, except for rebasing and fixing a bug I discovered in what's
now 0006. 0012 does a refactoring of pg_basebackup along similar lines
to the server-side refactoring from patches earlier in the series.
0012 is a really terrible, hacky, awful demonstration of how this
infrastructure can support server-side compression. If you apply it
and take a tar-format backup without -R, you will get .tar files that
are actually .tar.gz files. You can rename them, decompress them, and
use pg_verifybackup to check that everything is OK. If you try to do
anything else with 0012 applied, everything will break.
In the process of working on this, I learned a lot about how
pg_basebackup actually works, and found out about a number of things
that, with the benefit of hindsight, seem like they might not have
been the best way to go.
1. pg_basebackup -R injects recovery.conf (on older versions) or
injects standby.signal and appends to postgresql.auto.conf (on newer
versions) by parsing the tar file sent by the server and editing it on
the fly. From the point of view of server-side compression, this is
not ideal, because if you want to make these kinds of changes when
server-side compression is in use, you'd have to decompress the stream
on the client side in order to figure out where in the steam you ought
to inject your changes. But having to do that is a major expense. If
the client instead told the server what to change when generating the
archive, and the server did it, this expense could be avoided. It
would have the additional advantage that the backup manifest could
reflect the effects of those changes; right now it doesn't, and
pg_verifybackup just knows to expect differences in those files.
2. According to the comments, some tar programs require two tar blocks
(i.e. 512-byte blocks) of zero bytes at the end of an archive. The
server does not generate these blocks of zero bytes, so it basically
creates a tar file that works fine with my copy of tar but might break
with somebody else's. Instead, the client appends 1024 zero bytes to
the end of every file it receives from the server. That is an odd way
of fixing this problem, and it makes things rather inflexible. If the
server sends you any kind of a file OTHER THAN a tar file with the
last 1024 zero bytes stripped off, then adding 1024 zero bytes will be
the wrong thing to do. It would be better if the server just generated
fully correct tar files (whatever we think that means) and the client
wrote out exactly what it got from the server. Then, we could have the
server generate cpio archives or zip files or gzip-compressed tar
files or lz4-compressed tar files or anything we like, and the client
wouldn't really need to care as long as it didn't need to extract
those archives. That seems a lot cleaner.
3. The way that progress reporting works relies on the server knowing
exactly how large the archive sent to the client is going to be.
Progress as reckoned by the client is equal to the number of archive
payload bytes the client has received. This works OK with a tar
because we know how big the tar file is going to be based on the size
of the input files we intend to send, but it's unsuitable for any sort
of compressed archive (tar.gz, zip, whatever) because the compression
ratio cannot be predicted in advance. It would be better if the server
sent the payload bytes (possibly compressed) interleaved with progress
indicators, so that the client could correctly indicate that, say, the
backup is 30% complete because 30GB of 100GB has been processed on the
server side, even though the amount of data actually received by the
client might be 25GB or 20GB or 10GB or whatever because it got
compressed before transmission.
4. A related consideration is that we might want to have an option to
do something with the backup other than send it to the client. For
example, it might be useful to have an option for pg_basebackup to
tell the server to write the backup files to some specified server
directory, or to, say, S3. There are security concerns there, and I'm
not proposing to do anything about this immediately, but it seems like
something we might eventually want to have. In such a case, we are not
going to send any payload to the client, but the client probably still
wants progress indicators, so the current system of coupling progress
to the number of bytes received by the client breaks down for that
reason also.
5. As things stand today, the client must know exactly how many
archives it should expect to receive from the server and what each one
is. It can do that, because it knows to expect one archive per
tablespace, and the archive must be an uncompressed tarfile, so there
is no ambiguity. But, if the server could send archives to other
places, or send other kinds of archives to the client, then this would
become more complex. There is no intrinsic reason why the logic on the
client side can't simply be made more complicated in order to cope,
but it doesn't seem like great design, because then every time you
enhance the server, you've also got to enhance the client, and that
limits cross-version compatibility, and also seems more fragile. I
would rather that the server advertise the number of archives and the
names of each archive to the client explicitly, allowing the client to
be dumb unless it needs to post-process (e.g. extract) those archives.
Putting all of the above together, what I propose - but have not yet
tried to implement - is a new COPY sub-protocol for taking base
backups. Instead of sending a COPY stream per archive, the server
would send a single COPY stream where the first byte of each message
is a type indicator, like we do with the replication sub-protocol
today. For example, if the first byte is 'a' that could indicate that
we're beginning a new archive and the rest of the message would
indicate the archive name and perhaps some flags or options. If the
first byte is 'p' that could indicate that we're sending archive
payload, perhaps with the first four bytes of the message being
progress, i.e. the number of newly-processed bytes on the server side
prior to any compression, and the remaining bytes being payload. On
receipt of such a message, the client would increment the progress
indicator by the value indicated in those first four bytes, and then
process the remaining bytes by writing them to a file or whatever
behavior the user selected via -Fp, -Ft, -Z, etc. To be clear, I'm not
saying that this specific thing is the right thing, just something of
this sort. The server would need to continue supporting the current
multi-copy protocol for compatibility with older pg_basebackup
versions, and pg_basebackup would need to continue to support it for
compatibility with older server versions, but we could use the new
approach going forward. (Or, we could break compatibility, but that
would probably be unpopular and seems unnecessary and even risky to me
at this point.)
The ideas in the previous paragraph would address #3-#5 directly, but
they also indirectly address #2 because while we're switching
protocols we could easily move the padding with zero bytes to the
server side, and I think we should. #1 is a bit of a separate
consideration. To tackle #1 along the lines proposed above, the client
needs a way to send the recovery.conf contents to the server so that
the server can inject them into the tar file. It's not exactly clear
to me what the best way of permitting this is, and maybe there's a
totally different approach that would be altogether better. One thing
to consider is that we might well want the client to be able to send
*multiple* chunks of data to the server at the start of a backup. For
instance, suppose we want to support incremental backups. I think the
right approach is for the client to send the backup_manifest file from
the previous full backup to the server. What exactly the server does
with it afterward depends on your preferred approach, but the
necessary information is there. Maybe incremental backup is based on
comparing cryptographic checksums, so the server looks at all the
files and sends to the client those where the checksum (hopefully
SHA-something!) does not match. I wouldn't favor this approach myself,
but I know some people like it. Or maybe it's based on finding blocks
modified since the LSN of the previous backup; the manifest has enough
information for that to work, too. In such an approach, there can be
altogether new files with old LSNs, because files can be flat-copied
without changing block LSNs, so it's important to have the complete
list of files from the previous backup, and that too is in the
manifest. There are even timestamps for the bold among you. Anyway, my
point is to advocate for a design where the client says (1) I want a
backup with these options and then (2) here are 0, 1, or >1 files
(recovery parameters and/or backup manifest and/or other things) in
support of that and then the server hands back a stream of archives
which the client may or may not choose to post-process.
It's tempting to think about solving this problem by appealing to
CopyBoth, but I think that might be the wrong idea. The reason we use
CopyBoth for the replication subprotocol is because there's periodic
messages flowing in both directions that are only loosely coupled to
each other. Apart from reading frequently enough to avoid a deadlock
because both sides have full write buffers, each end of the connection
can kind of do whatever it wants. But for the kinds of use cases I'm
talking about here, that's not so. First the client talks and the
server acknowledges, then the reverse. That doesn't mean we couldn't
use CopyBoth, but maybe a CopyIn followed by a CopyOut would be more
straightforward. I am not sure of the details here and am happy to
hear the ideas of others.
One final thought is that the options framework for pg_basebackup is a
little unfortunate. As of today, what the client receives, always, is
a series of tar files. If you say -Fp, it doesn't change the backup
format; it just extracts the tar files. If you say -Ft, it doesn't. If
you say -Ft but also -Z, it compresses the tar files. Thinking just
about server-side compression and ignoring for the moment more remote
features like alternate archive formats (e.g. zip) or things like
storing the backup to an alternate location rather than returning it
to the client, you probably want the client to be able to specify at
least (1) server-side compression (perhaps with one of several
algorithms) and the client just writes the results, (2) server-side
compression (still with a choice of algorithm) and the client
decompresses but does not extract, (3) server-side compression (still
with a choice of algorithms) and the client decompresses and extracts,
(4) client-side compression (with a choice of algorithms), and (5)
client-side extraction. You might also want (6) server-side
compression (with a choice of algorithms) and client-side decompresses
and then re-compresses with a different algorithm (e.g. lz4 on the
server to save bandwidth at moderate CPU cost into parallel bzip2 on
the client for minimum archival storage). Or, as also discussed
upthread, you might want (7) server-side compression of each file
individually, so that you get a seekable archive of individually
compressed files (e.g. to support fast delta restore). I think that
with these refactoring patches - and the wire protocol redesign
mentioned above - it is very reasonable to offer as many of these
options as we believe users will find useful, but it is not very clear
how to extend the current command-line option framework to support
them. It probably would have been better if pg_basebackup, instead of
having -Fp and -Ft, just had an --extract option that you could either
specify or omit, because that would not have presumed anything about
the archive format, but the existing structure is well-baked at this
point.
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
v2-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchapplication/octet-stream; name=v2-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchDownload
From 787068be3c23308b6fe46fc8c731c6d6ef82f485 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 11 Jun 2020 15:28:39 -0400
Subject: [PATCH v2 01/11] Flexible options for BASE_BACKUP and
CREATE_REPLICATION_SLOT.
---
src/backend/replication/basebackup.c | 33 ++---
.../libpqwalreceiver/libpqwalreceiver.c | 8 +-
src/backend/replication/repl_gram.y | 115 +++++++++++++++---
src/backend/replication/walsender.c | 15 +--
src/bin/pg_basebackup/pg_basebackup.c | 108 ++++++++++++----
src/bin/pg_basebackup/streamutil.c | 14 ++-
6 files changed, 220 insertions(+), 73 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6064384e32..d43c34e8e9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "catalog/pg_type.h"
#include "common/file_perm.h"
+#include "commands/defrem.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "libpq/libpq.h"
@@ -777,7 +778,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->label = strVal(defel->arg);
+ opt->label = defGetString(defel);
o_label = true;
}
else if (strcmp(defel->defname, "progress") == 0)
@@ -786,7 +787,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->progress = true;
+ opt->progress = defGetBoolean(defel);
o_progress = true;
}
else if (strcmp(defel->defname, "fast") == 0)
@@ -795,16 +796,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->fastcheckpoint = true;
+ opt->fastcheckpoint = defGetBoolean(defel);
o_fast = true;
}
- else if (strcmp(defel->defname, "nowait") == 0)
+ else if (strcmp(defel->defname, "wait") == 0)
{
if (o_nowait)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->nowait = true;
+ opt->nowait = !defGetBoolean(defel);
o_nowait = true;
}
else if (strcmp(defel->defname, "wal") == 0)
@@ -813,19 +814,19 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->includewal = true;
+ opt->includewal = defGetBoolean(defel);
o_wal = true;
}
else if (strcmp(defel->defname, "max_rate") == 0)
{
- long maxrate;
+ int64 maxrate;
if (o_maxrate)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- maxrate = intVal(defel->arg);
+ maxrate = defGetInt64(defel);
if (maxrate < MAX_RATE_LOWER || maxrate > MAX_RATE_UPPER)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
@@ -841,21 +842,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->sendtblspcmapfile = true;
+ opt->sendtblspcmapfile = defGetBoolean(defel);
o_tablespace_map = true;
}
- else if (strcmp(defel->defname, "noverify_checksums") == 0)
+ else if (strcmp(defel->defname, "verify_checksums") == 0)
{
if (o_noverify_checksums)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- noverify_checksums = true;
+ noverify_checksums = !defGetBoolean(defel);
o_noverify_checksums = true;
}
else if (strcmp(defel->defname, "manifest") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
bool manifest_bool;
if (o_manifest)
@@ -880,7 +881,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "manifest_checksums") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
if (o_manifest_checksums)
ereport(ERROR,
@@ -895,8 +896,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
o_manifest_checksums = true;
}
else
- elog(ERROR, "option \"%s\" not recognized",
- defel->defname);
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname));
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index e9057230e4..c381e26143 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -828,19 +828,19 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
switch (snapshot_action)
{
case CRS_EXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " EXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, " (EXPORT_SNAPSHOT TRUE)");
break;
case CRS_NOEXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " NOEXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, " (EXPORT_SNAPSHOT FALSE)");
break;
case CRS_USE_SNAPSHOT:
- appendStringInfoString(&cmd, " USE_SNAPSHOT");
+ appendStringInfoString(&cmd, " (USE_SNAPSHOT)");
break;
}
}
else
{
- appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+ appendStringInfoString(&cmd, " PHYSICAL (RESERVE_WAL)");
}
res = libpqrcv_PQexec(conn->streamConn, cmd.data);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index f93a0de218..8b2109855d 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -94,16 +94,16 @@ static SQLCmd *make_sqlcmd(void);
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_legacy_opt_list generic_option_list
+%type <defelt> base_backup_legacy_opt generic_option
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
%type <node> plugin_opt_arg
-%type <str> opt_slot var_name
+%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
-%type <list> create_slot_opt_list
-%type <defelt> create_slot_opt
+%type <list> create_slot_options create_slot_legacy_opt_list
+%type <defelt> create_slot_legacy_opt
%%
@@ -156,12 +156,24 @@ var_name: IDENT { $$ = $1; }
;
/*
+ * BASE_BACKUP ( option [ 'value' ] [, ...] )
+ *
+ * We also still support the legacy syntax:
+ *
* BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
* [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
* [MANIFEST %s] [MANIFEST_CHECKSUMS %s]
+ *
+ * Future options should be supported only using the new syntax.
*/
base_backup:
- K_BASE_BACKUP base_backup_opt_list
+ K_BASE_BACKUP '(' generic_option_list ')'
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ $$ = (Node *) cmd;
+ }
+ | K_BASE_BACKUP base_backup_legacy_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
@@ -169,14 +181,14 @@ base_backup:
}
;
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
+base_backup_legacy_opt_list:
+ base_backup_legacy_opt_list base_backup_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-base_backup_opt:
+base_backup_legacy_opt:
K_LABEL SCONST
{
$$ = makeDefElem("label",
@@ -199,8 +211,8 @@ base_backup_opt:
}
| K_NOWAIT
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("wait",
+ (Node *)makeInteger(false), -1);
}
| K_MAX_RATE UCONST
{
@@ -214,8 +226,8 @@ base_backup_opt:
}
| K_NOVERIFY_CHECKSUMS
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("verify_checksums",
+ (Node *)makeInteger(false), -1);
}
| K_MANIFEST SCONST
{
@@ -230,8 +242,8 @@ base_backup_opt:
;
create_replication_slot:
- /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
- K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL [options] */
+ K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -241,8 +253,8 @@ create_replication_slot:
cmd->options = $5;
$$ = (Node *) cmd;
}
- /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
- | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin [options] */
+ | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -255,14 +267,19 @@ create_replication_slot:
}
;
-create_slot_opt_list:
- create_slot_opt_list create_slot_opt
+create_slot_options:
+ '(' generic_option_list ')' { $$ = $2 }
+ | create_slot_legacy_opt_list { $$ = $1 }
+ ;
+
+create_slot_legacy_opt_list:
+ create_slot_legacy_opt_list create_slot_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-create_slot_opt:
+create_slot_legacy_opt:
K_EXPORT_SNAPSHOT
{
$$ = makeDefElem("export_snapshot",
@@ -416,6 +433,64 @@ plugin_opt_arg:
sql_cmd:
IDENT { $$ = (Node *) make_sqlcmd(); }
;
+
+generic_option_list:
+ generic_option_list ',' generic_option
+ { $$ = lappend($1, $3); }
+ | generic_option
+ { $$ = list_make1($1); }
+ ;
+
+generic_option:
+ ident_or_keyword
+ {
+ $$ = makeDefElem($1, NULL, -1);
+ }
+ | ident_or_keyword IDENT
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword SCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword UCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeInteger($2), -1);
+ }
+ ;
+
+ident_or_keyword:
+ IDENT { $$ = $1; }
+ | K_BASE_BACKUP { $$ = "base_backup"; }
+ | K_IDENTIFY_SYSTEM { $$ = "identify_system"; }
+ | K_SHOW { $$ = "show"; }
+ | K_START_REPLICATION { $$ = "start_replication"; }
+ | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
+ | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
+ | K_TIMELINE_HISTORY { $$ = "timeline_history"; }
+ | K_LABEL { $$ = "label"; }
+ | K_PROGRESS { $$ = "progress"; }
+ | K_FAST { $$ = "fast"; }
+ | K_WAIT { $$ = "wait"; }
+ | K_NOWAIT { $$ = "nowait"; }
+ | K_MAX_RATE { $$ = "max_rate"; }
+ | K_WAL { $$ = "wal"; }
+ | K_TABLESPACE_MAP { $$ = "tablespace_map"; }
+ | K_NOVERIFY_CHECKSUMS { $$ = "noverify_checksums"; }
+ | K_TIMELINE { $$ = "timeline"; }
+ | K_PHYSICAL { $$ = "physical"; }
+ | K_LOGICAL { $$ = "logical"; }
+ | K_SLOT { $$ = "slot"; }
+ | K_RESERVE_WAL { $$ = "reserve_wal"; }
+ | K_TEMPORARY { $$ = "temporary"; }
+ | K_EXPORT_SNAPSHOT { $$ = "export_snapshot"; }
+ | K_NOEXPORT_SNAPSHOT { $$ = "noexport_snapshot"; }
+ | K_USE_SNAPSHOT { $$ = "use_snapshot"; }
+ | K_MANIFEST { $$ = "manifest"; }
+ | K_MANIFEST_CHECKSUMS { $$ = "manifest_checksums"; }
+ ;
+
%%
static SQLCmd *
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5e2210dd7b..5da45ecca0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -897,7 +897,8 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
snapshot_action_given = true;
- *snapshot_action = CRS_USE_SNAPSHOT;
+ if (defGetBoolean(defel))
+ *snapshot_action = CRS_USE_SNAPSHOT;
}
else if (strcmp(defel->defname, "reserve_wal") == 0)
{
@@ -907,7 +908,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
reserve_wal_given = true;
- *reserve_wal = true;
+ *reserve_wal = defGetBoolean(defel);
}
else
elog(ERROR, "unrecognized option: %s", defel->defname);
@@ -974,7 +975,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... EXPORT_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (EXPORT_SNAPSHOT)")));
need_full_snapshot = true;
}
@@ -984,25 +985,25 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (XactIsoLevel != XACT_REPEATABLE_READ)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called in REPEATABLE READ isolation mode transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (FirstSnapshotSet)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called before any query",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (IsSubTransaction())
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called in a subtransaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
need_full_snapshot = true;
}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4f29671d0c..0645e983c6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1787,6 +1787,49 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
appendPQExpBuffer(buf, copybuf, r);
}
+static void
+AppendBaseBackupPlainOption(StringInfo buf, bool use_new_option_syntax,
+ char *option_name)
+{
+ if (buf->len > 0)
+ {
+ if (use_new_option_syntax)
+ appendStringInfoString(buf, ", ");
+ else
+ appendStringInfoChar(buf, ' ');
+ }
+
+ appendStringInfoString(buf, option_name);
+}
+
+static void
+AppendBaseBackupStringOption(StringInfo buf, bool use_new_option_syntax,
+ char *option_name, char *option_value)
+{
+ AppendBaseBackupPlainOption(buf, use_new_option_syntax, option_name);
+
+ if (option_value != NULL)
+ {
+ size_t length = strlen(option_value);
+ char *escaped_value = palloc(1 + 2 * length);
+
+ PQescapeStringConn(conn, escaped_value, option_value, length, NULL);
+ appendStringInfoString(buf, " '");
+ appendStringInfoString(buf, escaped_value);
+ appendStringInfoChar(buf, '\'');
+ pfree(escaped_value);
+ }
+}
+
+static void
+AppendBaseBackupIntegerOption(StringInfo buf, bool use_new_option_syntax,
+ char *option_name, int32 option_value)
+{
+ AppendBaseBackupPlainOption(buf, use_new_option_syntax, option_name);
+
+ appendStringInfo(buf, " %d", option_value);
+}
+
static void
BaseBackup(void)
{
@@ -1795,10 +1838,6 @@ BaseBackup(void)
TimeLineID latesttli;
TimeLineID starttli;
char *basebkp;
- char escaped_label[MAXPGPATH];
- char *maxrate_clause = NULL;
- char *manifest_clause = NULL;
- char *manifest_checksums_clause = "";
int i;
char xlogstart[64];
char xlogend[64];
@@ -1807,8 +1846,11 @@ BaseBackup(void)
int serverVersion,
serverMajor;
int writing_to_stdout;
+ bool use_new_option_syntax = false;
+ StringInfoData buf;
Assert(conn != NULL);
+ initStringInfo(&buf);
/*
* Check server version. BASE_BACKUP command was introduced in 9.1, so we
@@ -1826,6 +1868,8 @@ BaseBackup(void)
serverver ? serverver : "'unknown'");
exit(1);
}
+ if (serverMajor >= 1400)
+ use_new_option_syntax = true;
/*
* If WAL streaming was requested, also check that the server is new
@@ -1856,20 +1900,42 @@ BaseBackup(void)
/*
* Start the actual backup
*/
- PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);
-
+ AppendBaseBackupStringOption(&buf, use_new_option_syntax, "LABEL", label);
+ if (estimatesize)
+ AppendBaseBackupPlainOption(&buf, use_new_option_syntax, "PROGRESS");
+ if (includewal == FETCH_WAL)
+ AppendBaseBackupPlainOption(&buf, use_new_option_syntax, "WAL");
+ if (fastcheckpoint)
+ AppendBaseBackupPlainOption(&buf, use_new_option_syntax, "FAST");
+ if (includewal == NO_WAL)
+ {
+ if (use_new_option_syntax)
+ AppendBaseBackupIntegerOption(&buf, use_new_option_syntax, "WAIT", 0);
+ else
+ AppendBaseBackupPlainOption(&buf, use_new_option_syntax, "NOWAIT");
+ }
if (maxrate > 0)
- maxrate_clause = psprintf("MAX_RATE %u", maxrate);
+ AppendBaseBackupIntegerOption(&buf, use_new_option_syntax, "MAX_RATE",
+ maxrate);
+ if (format == 't')
+ AppendBaseBackupPlainOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+ if (!verify_checksums)
+ {
+ if (use_new_option_syntax)
+ AppendBaseBackupIntegerOption(&buf, use_new_option_syntax,
+ "VERIFY_CHECKSUMS", 0);
+ else
+ AppendBaseBackupPlainOption(&buf, use_new_option_syntax,
+ "NOVERIFY_CHECKSUMS");
+ }
if (manifest)
{
- if (manifest_force_encode)
- manifest_clause = "MANIFEST 'force-encode'";
- else
- manifest_clause = "MANIFEST 'yes'";
+ AppendBaseBackupStringOption(&buf, use_new_option_syntax, "MANIFEST",
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
- manifest_checksums_clause = psprintf("MANIFEST_CHECKSUMS '%s'",
- manifest_checksums);
+ AppendBaseBackupStringOption(&buf, use_new_option_syntax,
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
if (verbose)
@@ -1884,18 +1950,10 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s %s",
- escaped_label,
- estimatesize ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
- manifest_clause ? manifest_clause : "",
- manifest_checksums_clause);
+ if (use_new_option_syntax && buf.len > 0)
+ basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
+ else
+ basebkp = psprintf("BASE_BACKUP %s", buf.data);
if (PQsendQuery(conn, basebkp) == 0)
{
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index 410116492e..8ed628d8fa 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -505,14 +505,24 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
{
appendPQExpBufferStr(query, " PHYSICAL");
if (reserve_wal)
- appendPQExpBufferStr(query, " RESERVE_WAL");
+ {
+ if (PQserverVersion(conn) >= 140000)
+ appendPQExpBufferStr(query, " (RESERVE_WAL)");
+ else
+ appendPQExpBufferStr(query, " RESERVE_WAL");
+ }
}
else
{
appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
if (PQserverVersion(conn) >= 100000)
+ {
/* pg_recvlogical doesn't use an exported snapshot, so suppress */
- appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ if (PQserverVersion(conn) >= 140000)
+ appendPQExpBufferStr(query, " (EXPORT_SNAPSHOT 0)");
+ else
+ appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ }
}
res = PQexec(conn, query->data);
--
2.24.3 (Apple Git-128)
v2-0003-Introduce-bbsink-abstraction.patchapplication/octet-stream; name=v2-0003-Introduce-bbsink-abstraction.patchDownload
From 291072c1fce4717cef3e20b15fdcbeaf143d5e78 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:11:33 -0400
Subject: [PATCH v2 03/11] Introduce bbsink abstraction.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup_sink.c | 72 +++++++++
src/include/replication/basebackup_sink.h | 176 ++++++++++++++++++++++
3 files changed, 249 insertions(+)
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..25d56478f4 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_sink.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..bd0298990d
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,72 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+void
+bbsink_forward_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_backup(sink->bbs_next, startptr, starttli, tablespaces);
+}
+
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+void
+bbsink_forward_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_archive_contents(sink->bbs_next, data, len);
+}
+
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+void
+bbsink_forward_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_manifest_contents(sink->bbs_next, data, len);
+}
+
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..050cf1180d
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,176 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * From a logical point of view, a basebackup sink's callbacks are invoked
+ * after the source files read from the data directory have been assembled
+ * into archives (e.g. by creating one tar file per tablespace) but before
+ * those archives are sent to the client. In reality, processing is
+ * interleaved, with archives being generated incrementally and these
+ * callbacks being invoked on the archive fragment as they are generated.
+ * The point, however, is that a basebackup sink shouldn't be trying to
+ * do anything with individual data files, nor should it do anything that
+ * depends on a particular choice of archive format. It should only
+ * perform processing that treats the archives passed to it -- and the
+ * backup manifest -- as opaque blobs of bytes.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_next' is a pointer to another bbarchiver to which this bbarchiver is
+ * forwarding some or all operations.
+ *
+ * If a bbsink needs to store additional state, it can allocate a larger
+ * structure whose first element is a bbsink.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ bbsink *bbs_next;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline functions
+ * rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /* This callback is invoked just once, at the very start of the backup. */
+ void (*begin_backup)(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces);
+
+ /*
+ * For each archive produced by the backup process, there will be one call
+ * to the begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ */
+ void (*begin_archive)(bbsink *sink, const char *archive_name);
+ void (*archive_contents)(bbsink *sink, const char *data, size_t len);
+ void (*end_archive)(bbsink *sink);
+
+ /*
+ * After all archives have been sent, and provided that the caller has
+ * requested a backup manifest, there will be one call to the
+ * begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback.
+ */
+ void (*begin_manifest)(bbsink *sink);
+ void (*manifest_contents)(bbsink *sink, const char *data, size_t len);
+ void (*end_manifest)(bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup)(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, XLogRecPtr startptr, TimeLineID starttli,
+ List *tablespaces)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->begin_backup(sink, startptr, starttli, tablespaces);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->archive_contents(sink, data, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->manifest_contents(sink, data, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli,
+ List *tablespaces);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, const char *data,
+ size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, const char *data,
+ size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+#endif
--
2.24.3 (Apple Git-128)
v2-0002-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchapplication/octet-stream; name=v2-0002-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchDownload
From c2c768d2733ba5c13f72e15fe7930bc52ff911d1 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v2 02/11] Recast _tarWriteDirectory as
convert_link_to_directory.
So that it doesn't get tangled up in tar-specific considerations.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d43c34e8e9..6916132400 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -71,8 +71,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1356,7 +1355,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1372,7 +1373,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1384,7 +1387,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1854,12 +1859,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1868,8 +1872,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.3 (Apple Git-128)
v2-0005-Convert-throttling-related-code-to-a-bbsink.patchapplication/octet-stream; name=v2-0005-Convert-throttling-related-code-to-a-bbsink.patchDownload
From d346974e1f6f2a505c7a6d083b89f47affa4ebb0 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 7 May 2020 12:22:17 -0400
Subject: [PATCH v2 05/11] Convert throttling-related code to a bbsink.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 123 +---------
src/backend/replication/basebackup_throttle.c | 211 ++++++++++++++++++
src/include/replication/basebackup_sink.h | 1 +
4 files changed, 217 insertions(+), 119 deletions(-)
create mode 100644 src/backend/replication/basebackup_throttle.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 6adc396501..58b6c228bb 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_libpq.o \
basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index a56b0e9813..e0f469e3f2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -75,7 +75,6 @@ static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
@@ -92,23 +91,6 @@ static char *statrelpath = NULL;
*/
#define TAR_SEND_SIZE 32768
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
/* The starting XLOG position of the base backup. */
static XLogRecPtr startptr;
@@ -262,6 +244,10 @@ perform_base_backup(basebackup_options *opt)
List *tablespaces = NIL;
bbsink *sink = bbsink_libpq_new();
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
backup_total = 0;
backup_streamed = 0;
pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
@@ -370,30 +356,6 @@ perform_base_backup(basebackup_options *opt)
/* notify basebackup sink about start of backup */
bbsink_begin_backup(sink, startptr, starttli, tablespaces);
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
-
/* Send off our tablespaces one by one */
foreach(lc, tablespaces)
{
@@ -638,7 +600,6 @@ perform_base_backup(basebackup_options *opt)
update_basebackup_progress(cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -1614,7 +1575,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
@@ -1628,7 +1588,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
update_basebackup_progress(cnt);
len += cnt;
- throttle(cnt);
}
}
@@ -1722,80 +1681,6 @@ convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
-/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
- */
-static void
-throttle(size_t increment)
-{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
-
- /*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
- */
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
-
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
-
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
- }
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
-}
-
/*
* Increment the counter for the amount of data already streamed
* by the given number of bytes, and update the progress report for
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..0e3b4542bd
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,211 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink,
+ XLogRecPtr startptr,
+ TimeLineID starttli,
+ List *tablespaces);
+static void bbsink_throttle_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink,
+ const char *data, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_backup(sink->bbs_next, startptr, starttli, tablespaces);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ throttle(mysink, len);
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_archive_contents(sink->bbs_next, data, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ throttle(mysink, len);
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_manifest_contents(sink->bbs_next, data, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a8df937957..bc1710e2eb 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -175,5 +175,6 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_libpq_new(void);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
#endif
--
2.24.3 (Apple Git-128)
v2-0004-Convert-libpq-related-code-to-a-bbsink.patchapplication/octet-stream; name=v2-0004-Convert-libpq-related-code-to-a-bbsink.patchDownload
From 6b52a8697a1c0d9bd515afcf7ed6a331c6a9e056 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 6 May 2020 12:08:21 -0400
Subject: [PATCH v2 04/11] Convert libpq-related code to a bbsink.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/backup_manifest.c | 18 +-
src/backend/replication/basebackup.c | 286 +++++--------------
src/backend/replication/basebackup_libpq.c | 309 +++++++++++++++++++++
src/include/replication/backup_manifest.h | 4 +-
src/include/replication/basebackup_sink.h | 3 +
6 files changed, 388 insertions(+), 233 deletions(-)
create mode 100644 src/backend/replication/basebackup_libpq.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 25d56478f4..6adc396501 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_libpq.o \
basebackup_sink.o \
repl_gram.o \
slot.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index b626004927..ff326bce19 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -283,9 +284,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -321,19 +321,15 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
+ * Send the backup manifest.
*
* We choose to read back the data from the temporary file in chunks of
* size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
* size, so it seems to make sense to match that value here.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
char manifestbuf[BLCKSZ];
@@ -347,12 +343,10 @@ SendBackupManifest(backup_manifest_info *manifest)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, manifestbuf, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6916132400..a56b0e9813 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,10 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +28,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -59,24 +57,23 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
static void throttle(size_t increment);
static void update_basebackup_progress(int64 delta);
@@ -263,6 +260,7 @@ perform_base_backup(basebackup_options *opt)
backup_manifest_info manifest;
int datadirpathlen;
List *tablespaces = NIL;
+ bbsink *sink = bbsink_libpq_new();
backup_total = 0;
backup_streamed = 0;
@@ -345,10 +343,10 @@ perform_base_backup(basebackup_options *opt)
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
+ tmp->size = sendDir(sink, ".", 1, true, tablespaces, true, NULL,
NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
backup_total += tmp->size;
}
@@ -369,11 +367,8 @@ perform_base_backup(basebackup_options *opt)
pgstat_progress_update_multi_param(3, index, val);
}
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, startptr, starttli, tablespaces);
/* Setup and activate network throttling, if client requested it */
if (opt->maxrate > 0)
@@ -403,33 +398,28 @@ perform_base_backup(basebackup_options *opt)
foreach(lc, tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
+ sendDir(sink, ".", 1, false, tablespaces, sendtblspclinks,
&manifest, NULL);
/* ... and pg_control after everything else. */
@@ -438,24 +428,30 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
Assert(lnext(tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
+ bbsink_end_archive(sink);
tblspc_streamed++;
pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
@@ -630,7 +626,7 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
while ((cnt = basebackup_read_file(fd, buf,
Min(sizeof(buf),
@@ -638,10 +634,7 @@ perform_base_backup(basebackup_options *opt)
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
+ bbsink_archive_contents(sink, buf, cnt);
update_basebackup_progress(cnt);
len += cnt;
@@ -674,7 +667,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +690,22 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -941,151 +933,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", (uint32) (ptr >> 32), (uint32) ptr);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
@@ -1113,9 +965,8 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
+ bbsink_archive_contents(sink, content, len);
update_basebackup_progress(len);
/* Pad to a multiple of the tar block size. */
@@ -1125,7 +976,7 @@ sendFileWithContent(const char *filename, const char *content,
char buf[TAR_BLOCK_SIZE];
MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
+ bbsink_archive_contents(sink, buf, pad);
update_basebackup_progress(pad);
}
@@ -1142,7 +993,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1172,11 +1023,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1195,8 +1046,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1356,8 +1207,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1374,8 +1225,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1388,15 +1239,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1427,7 +1278,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1451,7 +1302,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1483,7 +1334,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1491,7 +1342,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1568,7 +1419,7 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
@@ -1601,7 +1452,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1756,10 +1607,7 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
+ bbsink_archive_contents(sink, buf, cnt);
update_basebackup_progress(cnt);
/* Also feed it to the checksum machinery. */
@@ -1776,7 +1624,7 @@ sendFile(const char *readfilename, const char *tarfilename,
while (len < statbuf->st_size)
{
cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
+ bbsink_archive_contents(sink, buf, cnt);
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
update_basebackup_progress(cnt);
len += cnt;
@@ -1793,7 +1641,7 @@ sendFile(const char *readfilename, const char *tarfilename,
if (pad > 0)
{
MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
+ bbsink_archive_contents(sink, buf, pad);
update_basebackup_progress(pad);
}
@@ -1820,7 +1668,7 @@ sendFile(const char *readfilename, const char *tarfilename,
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
char h[TAR_BLOCK_SIZE];
@@ -1851,7 +1699,7 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
+ bbsink_archive_contents(sink, h, sizeof(h));
update_basebackup_progress(sizeof(h));
}
diff --git a/src/backend/replication/basebackup_libpq.c b/src/backend/replication/basebackup_libpq.c
new file mode 100644
index 0000000000..f0024a881a
--- /dev/null
+++ b/src/backend/replication/basebackup_libpq.c
@@ -0,0 +1,309 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_libpq.c
+ * send archives and backup manifest to client via libpq
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_libpq.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_libpq_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces);
+static void bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_libpq_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_libpq_end_archive(bbsink *sink);
+static void bbsink_libpq_begin_manifest(bbsink *sink);
+static void bbsink_libpq_manifest_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_libpq_end_manifest(bbsink *sink);
+static void bbsink_libpq_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+
+const bbsink_ops bbsink_libpq_ops = {
+ .begin_backup = bbsink_libpq_begin_backup,
+ .begin_archive = bbsink_libpq_begin_archive,
+ .archive_contents = bbsink_libpq_archive_contents,
+ .end_archive = bbsink_libpq_end_archive,
+ .begin_manifest = bbsink_libpq_begin_manifest,
+ .manifest_contents = bbsink_libpq_manifest_contents,
+ .end_manifest = bbsink_libpq_end_manifest,
+ .end_backup = bbsink_libpq_end_backup
+};
+
+/*
+ * Create a new 'libpq' bbsink.
+ */
+bbsink *
+bbsink_libpq_new(void)
+{
+ bbsink *sink = palloc(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_libpq_ops;
+ sink->bbs_next = NULL;
+
+ return sink;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_libpq_begin_backup(bbsink *sink, XLogRecPtr startptr, TimeLineID starttli,
+ List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ SendXlogRecPtrResult(startptr, starttli);
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_libpq_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ SendCopyData(data, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_libpq_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_libpq_manifest_contents(bbsink *sink, const char *data, size_t len)
+{
+ SendCopyData(data, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_libpq_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_libpq_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", (uint32) (ptr >> 32), (uint32) ptr);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index fb1291cbe4..bbd08f1852 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,6 +47,6 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 050cf1180d..a8df937957 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -173,4 +173,7 @@ extern void bbsink_forward_end_manifest(bbsink *sink);
extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_libpq_new(void);
+
#endif
--
2.24.3 (Apple Git-128)
v2-0006-Convert-progress-reporting-code-to-a-bbsink.patchapplication/octet-stream; name=v2-0006-Convert-progress-reporting-code-to-a-bbsink.patchDownload
From 0937a4307c7e6067420676d73956b8ce158eb10a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 7 May 2020 15:18:39 -0400
Subject: [PATCH v2 06/11] Convert progress-reporting code to a bbsink.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 104 +------
src/backend/replication/basebackup_progress.c | 291 ++++++++++++++++++
src/include/replication/basebackup_sink.h | 8 +
4 files changed, 308 insertions(+), 96 deletions(-)
create mode 100644 src/backend/replication/basebackup_progress.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 58b6c228bb..7de4f82882 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_libpq.o \
+ basebackup_progress.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e0f469e3f2..51c523e4ae 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,7 +19,6 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
@@ -75,7 +74,6 @@ static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -100,15 +98,6 @@ static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -243,26 +232,14 @@ perform_base_backup(basebackup_options *opt)
int datadirpathlen;
List *tablespaces = NIL;
bbsink *sink = bbsink_libpq_new();
+ bbsink *progress_sink;
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
-
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -279,8 +256,7 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+ basebackup_progress_wait_checkpoint();
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
labelfile, &tablespaces,
tblspc_map_file, opt->sendtblspcmapfile);
@@ -296,7 +272,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -321,8 +296,7 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
foreach(lc, tablespaces)
{
@@ -334,25 +308,9 @@ perform_base_backup(basebackup_options *opt)
else
tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
}
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
/* notify basebackup sink about start of backup */
bbsink_begin_backup(sink, startptr, starttli, tablespaces);
@@ -414,14 +372,9 @@ perform_base_backup(basebackup_options *opt)
}
else
bbsink_end_archive(sink);
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -447,8 +400,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -597,7 +549,6 @@ perform_base_backup(basebackup_options *opt)
{
CheckXLogRemoved(segno, tli);
bbsink_archive_contents(sink, buf, cnt);
- update_basebackup_progress(cnt);
len += cnt;
@@ -682,7 +633,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -928,7 +879,6 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
_tarWriteHeader(sink, filename, NULL, &statbuf, false);
bbsink_archive_contents(sink, content, len);
- update_basebackup_progress(len);
/* Pad to a multiple of the tar block size. */
pad = tarPaddingBytesRequired(len);
@@ -938,7 +888,6 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
MemSet(buf, 0, pad);
bbsink_archive_contents(sink, buf, pad);
- update_basebackup_progress(pad);
}
pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
@@ -1569,7 +1518,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
}
bbsink_archive_contents(sink, buf, cnt);
- update_basebackup_progress(cnt);
/* Also feed it to the checksum machinery. */
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
@@ -1586,7 +1534,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
cnt = Min(sizeof(buf), statbuf->st_size - len);
bbsink_archive_contents(sink, buf, cnt);
pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
- update_basebackup_progress(cnt);
len += cnt;
}
}
@@ -1601,7 +1548,6 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
{
MemSet(buf, 0, pad);
bbsink_archive_contents(sink, buf, pad);
- update_basebackup_progress(pad);
}
CloseTransientFile(fd);
@@ -1659,7 +1605,6 @@ _tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
}
bbsink_archive_contents(sink, h, sizeof(h));
- update_basebackup_progress(sizeof(h));
}
return sizeof(h);
@@ -1681,39 +1626,6 @@ convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
-/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
- */
-static void
-update_basebackup_progress(int64 delta)
-{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
-}
-
/*
* Read some data from a file, setting a wait event and reporting any error
* encountered.
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..63513fa7b0
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,291 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress reporting. Data is forwarded to
+ * the next base backup sink in the chain and the number of bytes
+ * forwarded is used to update shared memory progress counters.
+ *
+ * Progress reporting requires extra callbacks that most base backup sinks
+ * don't. Rather than cramming those into the interface, we just have a few
+ * extra functions here that basebackup.c can call. (We could put the logic
+ * directly into that file as it's fairly simple, but it seems cleaner to
+ * have it all in one place.)
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_progress
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Are we estimating the backup size? */
+ bool estimate_backup_size;
+
+ /*
+ * Estimated total amount of backup data that will be streamed.
+ * -1 means that the size is not estimated.
+ */
+ int64 backup_total;
+
+ /* Amount of backup data already streamed */
+ int64 backup_streamed;
+
+ /* Total number of tablespaces. */
+ int tblspc_total;
+
+ /* Number of those that have been streamed. */
+ int tblspc_streamed;
+} bbsink_progress;
+
+static void bbsink_progress_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli,
+ List *tablespaces);
+static void bbsink_progress_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress reporting and forwards
+ * data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink_progress *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_progress));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_progress_ops;
+ sink->base.bbs_next = next;
+
+ sink->estimate_backup_size = estimate_backup_size;
+ sink->backup_total = -1;
+ sink->backup_streamed = 0;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of
+ * the backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ sink->backup_total);
+
+ return &sink->base;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink, XLogRecPtr startptr,
+ TimeLineID starttli, List *tablespaces)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ /* Save count of tablespaces. */
+ mysink->tblspc_total = list_length(tablespaces);
+
+ /*
+ * If the sizes of the individual tablespaces are being calculated, add
+ * them up to get a total size.
+ */
+ if (mysink->estimate_backup_size)
+ {
+ ListCell *lc;
+
+ mysink->backup_total = 0;
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ mysink->backup_total += ti->size;
+ }
+ }
+
+ /*
+ * Report that we are now streaming database files as a base backup.
+ * Also advertise the number of tablespaces, and, if known, the estimated
+ * total backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ val[1] = mysink->backup_total;
+ val[2] = mysink->tblspc_total;
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_backup(sink->bbs_next, startptr, starttli, tablespaces);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ /*
+ * We assume that the end of an archive means we've reached the end of a
+ * tablespace. That's not ideal: we might want to decouple those two
+ * concepts better.
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (mysink->tblspc_streamed < mysink->tblspc_total)
+ {
+ mysink->tblspc_streamed++;
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ mysink->tblspc_streamed);
+ }
+
+ /* Delegate to next sink. */
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * First pass archive contents to next sink, and then perform progress updates.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ /* First forward to next sink. */
+ Assert(sink->bbs_next != NULL);
+ bbsink_archive_contents(sink->bbs_next, data, len);
+
+ /* Now increment count of what was sent by length of data. */
+ mysink->backup_streamed += len;
+ val[nparam++] = mysink->backup_streamed;
+
+ /*
+ * Avoid overflowing past 100% or the full size. This may make the total
+ * size number change as we approach the end of the backup (the estimate
+ * will always be wrong if WAL is included), but that's better than having
+ * the done column be bigger than the total.
+ */
+ if (mysink->backup_total > -1 &&
+ mysink->backup_streamed > mysink->backup_total)
+ {
+ mysink->backup_total = mysink->backup_streamed;
+ val[nparam++] = mysink->backup_total;
+ }
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+ bbsink_progress *mysink = (bbsink_progress *) sink;
+
+ Assert(mysink->tblspc_streamed >= mysink->tblspc_total - 1);
+ Assert(mysink->tblspc_streamed <= mysink->tblspc_total);
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+
+ /*
+ * We report having finished all tablespaces at this point, even if
+ * the archive for the main tablespace is still open, because what's
+ * going to be added is WAL files, not files that are really from the
+ * main tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = mysink->tblspc_total = mysink->tblspc_streamed;
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index bc1710e2eb..bf2d71fafa 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -175,6 +175,14 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_libpq_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
#endif
--
2.24.3 (Apple Git-128)
v2-0009-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchapplication/octet-stream; name=v2-0009-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchDownload
From 6b41e7244e136325872a6bbeccbde2a13ddc79cc Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 8 May 2020 15:21:01 -0400
Subject: [PATCH v2 09/11] WIP: Convert backup manifest generation to a
bbarchiver.
---
src/backend/replication/backup_manifest.c | 106 +++++++++++++++++++-
src/backend/replication/basebackup.c | 113 +++++++++-------------
src/include/replication/backup_manifest.h | 5 +-
3 files changed, 153 insertions(+), 71 deletions(-)
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index ff326bce19..733c4d6313 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,7 +17,6 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
-#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -366,3 +365,108 @@ AppendStringToManifest(backup_manifest_info *manifest, char *s)
BufFileWrite(manifest->buffile, s, len);
manifest->manifest_size += len;
}
+
+typedef struct bbarchiver_manifest
+{
+ bbarchiver base;
+
+ /* These things are fixed at creation time. */
+ backup_manifest_info *manifest;
+
+ /* This changes for each tablespace. */
+ const char *spcoid;
+
+ /* These change for each file. */
+ const char *file_pathname;
+ size_t file_size;
+ pg_time_t file_mtime;
+ pg_checksum_context file_checksum_ctx;
+} bbarchiver_manifest;
+
+static void bbarchiver_manifest_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+static void bbarchiver_manifest_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_manifest_file_contents(bbarchiver *archiver,
+ const char *data,
+ size_t len);
+static void bbarchiver_manifest_end_file(bbarchiver *archiver);
+
+static const bbarchiver_ops bbarchiver_manifest_ops = {
+ .begin_tablespace = bbarchiver_manifest_begin_tablespace,
+ .end_tablespace = bbarchiver_forward_end_tablespace,
+ .begin_file = bbarchiver_manifest_begin_file,
+ .file_contents = bbarchiver_manifest_file_contents,
+ .end_file = bbarchiver_manifest_end_file,
+ .directory = bbarchiver_forward_directory,
+ .symbolic_link = bbarchiver_forward_symbolic_link
+};
+
+extern bbarchiver *
+bbarchiver_manifest_new(bbarchiver *next, backup_manifest_info *manifest)
+{
+ bbarchiver_manifest *archiver = palloc0(sizeof(bbarchiver_manifest));
+
+ *((const bbarchiver_ops **) &archiver->base.bba_ops) =
+ &bbarchiver_manifest_ops;
+ archiver->base.bba_next = next;
+ archiver->manifest = manifest;
+
+ return &archiver->base;
+}
+
+static void
+bbarchiver_manifest_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ myarchiver->spcoid = tsinfo->oid;
+
+ bbarchiver_begin_tablespace(archiver->bba_next, tsinfo);
+}
+
+static void
+bbarchiver_manifest_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ myarchiver->file_pathname = relative_path;
+ myarchiver->file_size = statbuf->st_size;
+ myarchiver->file_mtime = statbuf->st_mtime;
+
+ pg_checksum_init(&myarchiver->file_checksum_ctx,
+ myarchiver->manifest->checksum_type);
+
+ bbarchiver_begin_file(archiver->bba_next, relative_path, statbuf);
+}
+
+static void
+bbarchiver_manifest_file_contents(bbarchiver *archiver,
+ const char *data, size_t len)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ pg_checksum_update(&myarchiver->file_checksum_ctx,
+ (uint8 *) data, len);
+
+ bbarchiver_file_contents(archiver->bba_next, data, len);
+}
+
+static void
+bbarchiver_manifest_end_file(bbarchiver *archiver)
+{
+ bbarchiver_manifest *myarchiver = (bbarchiver_manifest *) archiver;
+
+ AddFileToBackupManifest(myarchiver->manifest,
+ myarchiver->spcoid,
+ myarchiver->file_pathname,
+ myarchiver->file_size,
+ myarchiver->file_mtime,
+ &myarchiver->file_checksum_ctx);
+
+ bbarchiver_end_file(archiver->bba_next);
+}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e8183daa68..30e242f99e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -59,23 +59,17 @@ typedef struct
static void archive_database_cluster(List *tablespaces, bbarchiver *archiver,
StringInfo labelfile,
StringInfo tblspc_map_file,
- bool leave_main_tablespace_open,
- backup_manifest_info *manifest);
-static void archive_tablespace(bbarchiver *archiver, char *path, char *oid,
- struct backup_manifest_info *manifest);
+ bool leave_main_tablespace_open);
+static void archive_tablespace(bbarchiver *archiver, char *path);
static void archive_directory(bbarchiver *archiver, const char *path,
int basepathlen, List *tablespaces,
- bool sendtblspclinks,
- backup_manifest_info *manifest,
- const char *spcoid);
+ bool sendtblspclinks);
static void archive_file(bbarchiver *archiver, const char *readfilename,
const char *tarfilename, struct stat *statbuf,
- bool missing_ok, Oid dboid,
- backup_manifest_info *manifest, const char *spcoid);
+ bool missing_ok, Oid dboid);
static void archive_file_with_content(bbarchiver *archiver,
const char *filename,
- const char *content,
- backup_manifest_info *manifest);
+ const char *content);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -242,6 +236,14 @@ perform_base_backup(basebackup_options *opt)
bbarchiver *archiver;
bbarchiver *size_archiver;
+ /*
+ * The backup manifest code uses a BufFile, so create a ResourceOwner.
+ * This is cheap enough that we don't worry about doing it only if it's
+ * needed, and there might be other uses for it in the future.
+ */
+ Assert(CurrentResourceOwner == NULL);
+ CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -253,9 +255,13 @@ perform_base_backup(basebackup_options *opt)
archiver = bbarchiver_tar_new(sink);
size_archiver = bbarchiver_tarsize_new();
- /* we're going to use a BufFile, so we need a ResourceOwner */
- Assert(CurrentResourceOwner == NULL);
- CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
+ /* Set up backup manifest gneration, if enabled. */
+ if (opt->manifest != MANIFEST_OPTION_NO)
+ {
+ InitializeBackupManifest(&manifest, opt->manifest,
+ opt->manifest_checksum_type);
+ archiver = bbarchiver_manifest_new(archiver, &manifest);
+ }
datadirpathlen = strlen(DataDir);
@@ -263,8 +269,6 @@ perform_base_backup(basebackup_options *opt)
labelfile = makeStringInfo();
tblspc_map_file = makeStringInfo();
- InitializeBackupManifest(&manifest, opt->manifest,
- opt->manifest_checksum_type);
total_checksum_failures = 0;
@@ -309,7 +313,7 @@ perform_base_backup(basebackup_options *opt)
basebackup_progress_estimate_backup_size();
archive_database_cluster(tablespaces, size_archiver, labelfile,
- tblspc_map_file, false, NULL);
+ tblspc_map_file, false);
}
/* notify basebackup sink about start of backup */
@@ -322,7 +326,7 @@ perform_base_backup(basebackup_options *opt)
* so that we can archive the WAL files as well.
*/
archive_database_cluster(tablespaces, archiver, labelfile,
- tblspc_map_file, opt->includewal, &manifest);
+ tblspc_map_file, opt->includewal);
basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
@@ -524,7 +528,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- archive_file_with_content(archiver, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "");
}
/*
@@ -548,19 +552,23 @@ perform_base_backup(basebackup_options *opt)
errmsg("could not stat file \"%s\": %m", pathbuf)));
archive_file(archiver, pathbuf, pathbuf, &statbuf, false,
- InvalidOid, &manifest, NULL);
+ InvalidOid);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- archive_file_with_content(archiver, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "");
}
bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
- SendBackupManifest(&manifest, sink);
+ if (opt->manifest != MANIFEST_OPTION_NO)
+ {
+ AddWALInfoToBackupManifest(&manifest, startptr, starttli,
+ endptr, endtli);
+ SendBackupManifest(&manifest, sink);
+ }
bbsink_end_backup(sink, endptr, endtli);
@@ -588,8 +596,7 @@ perform_base_backup(basebackup_options *opt)
static void
archive_database_cluster(List *tablespaces, bbarchiver *archiver,
StringInfo labelfile, StringInfo tblspc_map_file,
- bool leave_main_tablespace_open,
- backup_manifest_info *manifest)
+ bool leave_main_tablespace_open)
{
ListCell *lc;
@@ -607,20 +614,18 @@ archive_database_cluster(List *tablespaces, bbarchiver *archiver,
/* For the main tablespace, archive the backup_label first... */
archive_file_with_content(archiver, BACKUP_LABEL_FILE,
- labelfile->data, manifest);
+ labelfile->data);
/* Then the tablespace_map file, if present... */
if (tblspc_map_file != NULL)
{
archive_file_with_content(archiver, TABLESPACE_MAP,
- tblspc_map_file->data,
- manifest);
+ tblspc_map_file->data);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- archive_directory(archiver, ".", 1, tablespaces,
- sendtblspclinks, manifest, NULL);
+ archive_directory(archiver, ".", 1, tablespaces, sendtblspclinks);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -629,10 +634,10 @@ archive_database_cluster(List *tablespaces, bbarchiver *archiver,
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
archive_file(archiver, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE,
- &statbuf, false, InvalidOid, manifest, NULL);
+ &statbuf, false, InvalidOid);
}
else
- archive_tablespace(archiver, ti->path, ti->oid, manifest);
+ archive_tablespace(archiver, ti->path);
/*
* If we were asked to leave the main tablespace open, then do so.
@@ -865,14 +870,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
*/
static void
archive_file_with_content(bbarchiver *archiver, const char *filename,
- const char *content, backup_manifest_info *manifest)
+ const char *content)
{
struct stat statbuf;
int len;
- pg_checksum_context checksum_ctx;
-
- if (manifest != NULL)
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
len = strlen(content);
@@ -896,13 +897,6 @@ archive_file_with_content(bbarchiver *archiver, const char *filename,
if (bbarchiver_needs_file_contents(archiver))
bbarchiver_file_contents(archiver, content, len);
bbarchiver_end_file(archiver);
-
- if (manifest != NULL)
- {
- pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
- AddFileToBackupManifest(manifest, NULL, filename, len,
- (pg_time_t) statbuf.st_mtime, &checksum_ctx);
- }
}
/*
@@ -913,8 +907,7 @@ archive_file_with_content(bbarchiver *archiver, const char *filename,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static void
-archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
- backup_manifest_info *manifest)
+archive_tablespace(bbarchiver *archiver, char *path)
{
char pathbuf[MAXPGPATH];
struct stat statbuf;
@@ -945,8 +938,7 @@ archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
bbarchiver_directory(archiver, TABLESPACE_VERSION_DIRECTORY, &statbuf);
/* Send all the files in the tablespace version directory */
- archive_directory(archiver, pathbuf, strlen(path), NIL, true, manifest,
- spcoid);
+ archive_directory(archiver, pathbuf, strlen(path), NIL, true);
}
/*
@@ -963,8 +955,7 @@ archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
*/
static void
archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
- List *tablespaces, bool sendtblspclinks,
- backup_manifest_info *manifest, const char *spcoid)
+ List *tablespaces, bool sendtblspclinks)
{
DIR *dir;
struct dirent *de;
@@ -1250,14 +1241,13 @@ archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
if (!skip_this_dir)
archive_directory(archiver, pathbuf, basepathlen, tablespaces,
- sendtblspclinks, manifest, spcoid);
+ sendtblspclinks);
}
else if (S_ISREG(statbuf.st_mode))
{
archive_file(archiver, pathbuf, pathbuf + basepathlen + 1,
&statbuf, true,
- isDbDir ? atooid(lastDir + 1) : InvalidOid,
- manifest, spcoid);
+ isDbDir ? atooid(lastDir + 1) : InvalidOid);
}
else
ereport(WARNING,
@@ -1318,7 +1308,7 @@ is_checksummed_file(const char *fullpath, const char *filename)
static void
archive_file(bbarchiver *archiver, const char *readfilename,
const char *tarfilename, struct stat *statbuf, bool missing_ok,
- Oid dboid, backup_manifest_info *manifest, const char *spcoid)
+ Oid dboid)
{
int fd;
BlockNumber blkno = 0;
@@ -1334,10 +1324,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
int segmentno = 0;
char *segmentpath;
bool verify_checksum = false;
- pg_checksum_context checksum_ctx;
-
- if (manifest != NULL)
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
bbarchiver_begin_file(archiver, tarfilename, statbuf);
@@ -1512,10 +1498,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
bbarchiver_file_contents(archiver, buf, cnt);
- /* Also feed it to the checksum machinery. */
- if (manifest != NULL)
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
-
len += cnt;
}
@@ -1527,8 +1509,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
{
cnt = Min(sizeof(buf), statbuf->st_size - len);
bbarchiver_file_contents(archiver, buf, cnt);
- if (manifest != NULL)
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
}
}
@@ -1549,11 +1529,6 @@ archive_file(bbarchiver *archiver, const char *readfilename,
}
total_checksum_failures += checksum_failures;
-
- if (manifest != NULL)
- AddFileToBackupManifest(manifest, spcoid, tarfilename,
- statbuf->st_size,
- (pg_time_t) statbuf->st_mtime, &checksum_ctx);
}
/*
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index bbd08f1852..d49b890743 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -14,7 +14,7 @@
#include "common/checksum_helper.h"
#include "pgtime.h"
-#include "replication/basebackup_sink.h"
+#include "replication/basebackup_archiver.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -49,4 +49,7 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
TimeLineID endtli);
extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
+extern bbarchiver *bbarchiver_manifest_new(bbarchiver *next,
+ backup_manifest_info *manifest);
+
#endif
--
2.24.3 (Apple Git-128)
v2-0008-Create-and-use-bbarchiver-implementations-for-tar.patchapplication/octet-stream; name=v2-0008-Create-and-use-bbarchiver-implementations-for-tar.patchDownload
From be1c2d894839531df520a7552068349654b0e0cb Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 7 May 2020 18:10:30 -0400
Subject: [PATCH v2 08/11] Create and use bbarchiver implementations for tar
and tar sizing.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 428 ++++++++++-------------
src/backend/replication/basebackup_tar.c | 266 ++++++++++++++
3 files changed, 454 insertions(+), 241 deletions(-)
create mode 100644 src/backend/replication/basebackup_tar.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index aacccd350d..6b3c77f2c0 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -21,6 +21,7 @@ OBJS = \
basebackup_libpq.o \
basebackup_progress.o \
basebackup_sink.o \
+ basebackup_tar.o \
basebackup_throttle.o \
repl_gram.o \
slot.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 51c523e4ae..e8183daa68 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -27,7 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
-#include "replication/basebackup_sink.h"
+#include "replication/basebackup_archiver.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -56,20 +56,26 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
- struct backup_manifest_info *manifest);
-static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
- List *tablespaces, bool sendtblspclinks,
- backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
- struct stat *statbuf, bool missing_ok, Oid dboid,
- backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(bbsink *sink, const char *filename,
- const char *content,
- backup_manifest_info *manifest);
-static int64 _tarWriteHeader(bbsink *sink, const char *filename,
- const char *linktarget, struct stat *statbuf,
- bool sizeonly);
+static void archive_database_cluster(List *tablespaces, bbarchiver *archiver,
+ StringInfo labelfile,
+ StringInfo tblspc_map_file,
+ bool leave_main_tablespace_open,
+ backup_manifest_info *manifest);
+static void archive_tablespace(bbarchiver *archiver, char *path, char *oid,
+ struct backup_manifest_info *manifest);
+static void archive_directory(bbarchiver *archiver, const char *path,
+ int basepathlen, List *tablespaces,
+ bool sendtblspclinks,
+ backup_manifest_info *manifest,
+ const char *spcoid);
+static void archive_file(bbarchiver *archiver, const char *readfilename,
+ const char *tarfilename, struct stat *statbuf,
+ bool missing_ok, Oid dboid,
+ backup_manifest_info *manifest, const char *spcoid);
+static void archive_file_with_content(bbarchiver *archiver,
+ const char *filename,
+ const char *content,
+ backup_manifest_info *manifest);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -233,6 +239,8 @@ perform_base_backup(basebackup_options *opt)
List *tablespaces = NIL;
bbsink *sink = bbsink_libpq_new();
bbsink *progress_sink;
+ bbarchiver *archiver;
+ bbarchiver *size_archiver;
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
@@ -241,6 +249,10 @@ perform_base_backup(basebackup_options *opt)
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
+ /* Set up tar archiving. */
+ archiver = bbarchiver_tar_new(sink);
+ size_archiver = bbarchiver_tarsize_new();
+
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
CurrentResourceOwner = ResourceOwnerCreate(NULL, "base backup");
@@ -260,6 +272,8 @@ perform_base_backup(basebackup_options *opt)
startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
labelfile, &tablespaces,
tblspc_map_file, opt->sendtblspcmapfile);
+ if (!opt->sendtblspcmapfile)
+ tblspc_map_file = NULL;
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -270,7 +284,6 @@ perform_base_backup(basebackup_options *opt)
PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
{
- ListCell *lc;
tablespaceinfo *ti;
/*
@@ -290,89 +303,26 @@ perform_base_backup(basebackup_options *opt)
ti->size = -1;
tablespaces = lappend(tablespaces, ti);
- /*
- * Calculate the total backup size by summing up the size of each
- * tablespace
- */
+ /* estimate sizes of all tablespaces, if PROGRESS option was given */
if (opt->progress)
{
basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
- {
- tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
-
- if (tmp->path == NULL)
- tmp->size = sendDir(sink, ".", 1, true, tablespaces, true, NULL,
- NULL);
- else
- tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
- NULL);
- }
+ archive_database_cluster(tablespaces, size_archiver, labelfile,
+ tblspc_map_file, false, NULL);
}
/* notify basebackup sink about start of backup */
bbsink_begin_backup(sink, startptr, starttli, tablespaces);
- /* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
-
- if (ti->path == NULL)
- {
- struct stat statbuf;
- bool sendtblspclinks = true;
-
- bbsink_begin_archive(sink, "base.tar");
-
- /* In the main tar, include the backup_label first... */
- sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
- &manifest);
-
- /* Then the tablespace_map file, if required... */
- if (opt->sendtblspcmapfile)
- {
- sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
- &manifest);
- sendtblspclinks = false;
- }
-
- /* Then the bulk of the files... */
- sendDir(sink, ".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
-
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not stat file \"%s\": %m",
- XLOG_CONTROL_FILE)));
- sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
- false, InvalidOid, &manifest, NULL);
- }
- else
- {
- char *archive_name = psprintf("%s.tar", ti->oid);
-
- bbsink_begin_archive(sink, archive_name);
-
- sendTablespace(sink, ti->path, ti->oid, false, &manifest);
- }
-
- /*
- * If we're including WAL, and this is the main data directory we
- * don't treat this as the end of the tablespace. Instead, we will
- * include the xlog files below and stop afterwards. This is safe
- * since the main data directory is always sent *last*.
- */
- if (opt->includewal && ti->path == NULL)
- {
- Assert(lnext(tablespaces, lc) == NULL);
- }
- else
- bbsink_end_archive(sink);
- }
+ /*
+ * Back up all of the tablespaces.
+ *
+ * If the backup is to include WAL, leave the main tablespace open,
+ * so that we can archive the WAL files as well.
+ */
+ archive_database_cluster(tablespaces, archiver, labelfile,
+ tblspc_map_file, opt->includewal, &manifest);
basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
@@ -540,15 +490,14 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
-
+ bbarchiver_begin_file(archiver, pathbuf, &statbuf);
while ((cnt = basebackup_read_file(fd, buf,
Min(sizeof(buf),
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- bbsink_archive_contents(sink, buf, cnt);
+ bbarchiver_file_contents(archiver, buf, cnt);
len += cnt;
@@ -564,11 +513,7 @@ perform_base_backup(basebackup_options *opt)
errmsg("unexpected WAL file size \"%s\"", walFileName)));
}
- /*
- * wal_segment_size is a multiple of TAR_BLOCK_SIZE, so no need
- * for padding.
- */
- Assert(wal_segment_size % TAR_BLOCK_SIZE == 0);
+ bbarchiver_end_file(archiver);
CloseTransientFile(fd);
@@ -579,7 +524,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(sink, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "", &manifest);
}
/*
@@ -602,12 +547,12 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
- &manifest, NULL);
+ archive_file(archiver, pathbuf, pathbuf, &statbuf, false,
+ InvalidOid, &manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(sink, pathbuf, "", &manifest);
+ archive_file_with_content(archiver, pathbuf, "", &manifest);
}
bbsink_end_archive(sink);
@@ -636,6 +581,74 @@ perform_base_backup(basebackup_options *opt)
basebackup_progress_done();
}
+/*
+ * Iterate over the entire cluster and feed each tablespace to the archiver
+ * in turn.
+ */
+static void
+archive_database_cluster(List *tablespaces, bbarchiver *archiver,
+ StringInfo labelfile, StringInfo tblspc_map_file,
+ bool leave_main_tablespace_open,
+ backup_manifest_info *manifest)
+{
+ ListCell *lc;
+
+ /* Send off our tablespaces one by one */
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
+
+ bbarchiver_begin_tablespace(archiver, ti);
+
+ if (ti->path == NULL)
+ {
+ struct stat statbuf;
+ bool sendtblspclinks = true;
+
+ /* For the main tablespace, archive the backup_label first... */
+ archive_file_with_content(archiver, BACKUP_LABEL_FILE,
+ labelfile->data, manifest);
+
+ /* Then the tablespace_map file, if present... */
+ if (tblspc_map_file != NULL)
+ {
+ archive_file_with_content(archiver, TABLESPACE_MAP,
+ tblspc_map_file->data,
+ manifest);
+ sendtblspclinks = false;
+ }
+
+ /* Then the bulk of the files... */
+ archive_directory(archiver, ".", 1, tablespaces,
+ sendtblspclinks, manifest, NULL);
+
+ /* ... and pg_control after everything else. */
+ if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ XLOG_CONTROL_FILE)));
+ archive_file(archiver, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE,
+ &statbuf, false, InvalidOid, manifest, NULL);
+ }
+ else
+ archive_tablespace(archiver, ti->path, ti->oid, manifest);
+
+ /*
+ * If we were asked to leave the main tablespace open, then do so.
+ * This is safe since the main data directory is always sent *last*,
+ * so we'll not try to begin another tablespace without ending this
+ * one.
+ */
+ if (leave_main_tablespace_open && ti->path == NULL)
+ {
+ Assert(lnext(tablespaces, lc) == NULL);
+ }
+ else
+ bbarchiver_end_tablespace(archiver);
+ }
+}
+
/*
* list_sort comparison function, to compare log/seg portion of WAL segment
* filenames, ignoring the timeline portion.
@@ -846,18 +859,20 @@ SendBaseBackup(BaseBackupCmd *cmd)
}
/*
- * Inject a file with given name and content in the output tar stream.
+ * Feed a file to the archiver that does not actually exist in the source
+ * directory. We use this to inject things like the backup_label file into
+ * the backup.
*/
static void
-sendFileWithContent(bbsink *sink, const char *filename, const char *content,
- backup_manifest_info *manifest)
+archive_file_with_content(bbarchiver *archiver, const char *filename,
+ const char *content, backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
- len;
+ int len;
pg_checksum_context checksum_ctx;
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
+ if (manifest != NULL)
+ pg_checksum_init(&checksum_ctx, manifest->checksum_type);
len = strlen(content);
@@ -877,22 +892,17 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- bbsink_archive_contents(sink, content, len);
+ bbarchiver_begin_file(archiver, filename, &statbuf);
+ if (bbarchiver_needs_file_contents(archiver))
+ bbarchiver_file_contents(archiver, content, len);
+ bbarchiver_end_file(archiver);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (manifest != NULL)
{
- char buf[TAR_BLOCK_SIZE];
-
- MemSet(buf, 0, pad);
- bbsink_archive_contents(sink, buf, pad);
+ pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
+ AddFileToBackupManifest(manifest, NULL, filename, len,
+ (pg_time_t) statbuf.st_mtime, &checksum_ctx);
}
-
- pg_checksum_update(&checksum_ctx, (uint8 *) content, len);
- AddFileToBackupManifest(manifest, NULL, filename, len,
- (pg_time_t) statbuf.st_mtime, &checksum_ctx);
}
/*
@@ -902,11 +912,10 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
*
* Only used to send auxiliary tablespaces, not PGDATA.
*/
-static int64
-sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
- backup_manifest_info *manifest)
+static void
+archive_tablespace(bbarchiver *archiver, char *path, char *spcoid,
+ backup_manifest_info *manifest)
{
- int64 size;
char pathbuf[MAXPGPATH];
struct stat statbuf;
@@ -930,17 +939,14 @@ sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
pathbuf)));
/* If the tablespace went away while scanning, it's no error. */
- return 0;
+ return;
}
- size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
- sizeonly);
+ bbarchiver_directory(archiver, TABLESPACE_VERSION_DIRECTORY, &statbuf);
/* Send all the files in the tablespace version directory */
- size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
- spcoid);
-
- return size;
+ archive_directory(archiver, pathbuf, strlen(path), NIL, true, manifest,
+ spcoid);
}
/*
@@ -955,16 +961,15 @@ sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
* information in the tar file. If not, we can skip that
* as it will be sent separately in the tablespace_map file.
*/
-static int64
-sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
- List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
- const char *spcoid)
+static void
+archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
+ List *tablespaces, bool sendtblspclinks,
+ backup_manifest_info *manifest, const char *spcoid)
{
DIR *dir;
struct dirent *de;
char pathbuf[MAXPGPATH * 2];
struct stat statbuf;
- int64 size = 0;
const char *lastDir; /* Split last dir from parent path. */
bool isDbDir = false; /* Does this directory contain relations? */
@@ -1117,8 +1122,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
excludeFound = true;
break;
}
@@ -1135,8 +1140,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
continue;
}
@@ -1149,15 +1154,15 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
- &statbuf, sizeonly);
+ bbarchiver_directory(archiver, "./pg_wal/archive_status",
+ &statbuf);
continue; /* don't recurse into pg_wal */
}
@@ -1188,8 +1193,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
- &statbuf, sizeonly);
+ bbarchiver_symbolic_link(archiver, pathbuf + basepathlen + 1,
+ linkpath, &statbuf);
#else
/*
@@ -1212,8 +1217,8 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ bbarchiver_directory(archiver, pathbuf + basepathlen + 1,
+ &statbuf);
/*
* Call ourselves recursively for a directory, unless it happens
@@ -1244,36 +1249,21 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
- sendtblspclinks, manifest, spcoid);
+ archive_directory(archiver, pathbuf, basepathlen, tablespaces,
+ sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
{
- bool sent = false;
-
- if (!sizeonly)
- sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
- true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
- manifest, spcoid);
-
- if (sent || sizeonly)
- {
- /* Add size. */
- size += statbuf.st_size;
-
- /* Pad to a multiple of the tar block size. */
- size += tarPaddingBytesRequired(statbuf.st_size);
-
- /* Size of the header for the file. */
- size += TAR_BLOCK_SIZE;
- }
+ archive_file(archiver, pathbuf, pathbuf + basepathlen + 1,
+ &statbuf, true,
+ isDbDir ? atooid(lastDir + 1) : InvalidOid,
+ manifest, spcoid);
}
else
ereport(WARNING,
(errmsg("skipping special file \"%s\"", pathbuf)));
}
FreeDir(dir);
- return size;
}
/*
@@ -1324,14 +1314,11 @@ is_checksummed_file(const char *fullpath, const char *filename)
*
* If dboid is anything other than InvalidOid then any checksum failures detected
* will get reported to the stats collector.
- *
- * Returns true if the file was successfully sent, false if 'missing_ok',
- * and the file did not exist.
*/
-static bool
-sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
- struct stat *statbuf, bool missing_ok, Oid dboid,
- backup_manifest_info *manifest, const char *spcoid)
+static void
+archive_file(bbarchiver *archiver, const char *readfilename,
+ const char *tarfilename, struct stat *statbuf, bool missing_ok,
+ Oid dboid, backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
@@ -1343,27 +1330,33 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
bool verify_checksum = false;
pg_checksum_context checksum_ctx;
- pg_checksum_init(&checksum_ctx, manifest->checksum_type);
+ if (manifest != NULL)
+ pg_checksum_init(&checksum_ctx, manifest->checksum_type);
+
+ bbarchiver_begin_file(archiver, tarfilename, statbuf);
+
+ if (!bbarchiver_needs_file_contents(archiver))
+ {
+ bbarchiver_end_file(archiver);
+ return;
+ }
fd = OpenTransientFile(readfilename, O_RDONLY | PG_BINARY);
if (fd < 0)
{
if (errno == ENOENT && missing_ok)
- return false;
+ return;
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
-
if (!noverify_checksums && DataChecksumsEnabled())
{
char *filename;
@@ -1517,10 +1510,11 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
}
}
- bbsink_archive_contents(sink, buf, cnt);
+ bbarchiver_file_contents(archiver, buf, cnt);
/* Also feed it to the checksum machinery. */
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
+ if (manifest != NULL)
+ pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
}
@@ -1532,23 +1526,14 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
while (len < statbuf->st_size)
{
cnt = Min(sizeof(buf), statbuf->st_size - len);
- bbsink_archive_contents(sink, buf, cnt);
- pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
+ bbarchiver_file_contents(archiver, buf, cnt);
+ if (manifest != NULL)
+ pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt);
len += cnt;
}
}
- /*
- * Pad to a block boundary, per tar format requirements. (This small
- * piece of data is probably not worth throttling, and is not checksummed
- * because it's not actually part of the file.)
- */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- bbsink_archive_contents(sink, buf, pad);
- }
+ bbarchiver_end_file(archiver);
CloseTransientFile(fd);
@@ -1565,54 +1550,15 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
total_checksum_failures += checksum_failures;
- AddFileToBackupManifest(manifest, spcoid, tarfilename, statbuf->st_size,
- (pg_time_t) statbuf->st_mtime, &checksum_ctx);
-
- return true;
-}
-
-
-static int64
-_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly)
-{
- char h[TAR_BLOCK_SIZE];
- enum tarError rc;
-
- if (!sizeonly)
- {
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
- statbuf->st_mtime);
-
- switch (rc)
- {
- case TAR_OK:
- break;
- case TAR_NAME_TOO_LONG:
- ereport(ERROR,
- (errmsg("file name too long for tar format: \"%s\"",
- filename)));
- break;
- case TAR_SYMLINK_TOO_LONG:
- ereport(ERROR,
- (errmsg("symbolic link target too long for tar format: "
- "file name \"%s\", target \"%s\"",
- filename, linktarget)));
- break;
- default:
- elog(ERROR, "unrecognized tar error: %d", rc);
- }
-
- bbsink_archive_contents(sink, h, sizeof(h));
- }
-
- return sizeof(h);
+ if (manifest != NULL)
+ AddFileToBackupManifest(manifest, spcoid, tarfilename,
+ statbuf->st_size,
+ (pg_time_t) statbuf->st_mtime, &checksum_ctx);
}
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
+ * directory.
*/
static void
convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
diff --git a/src/backend/replication/basebackup_tar.c b/src/backend/replication/basebackup_tar.c
new file mode 100644
index 0000000000..8618f2a0ec
--- /dev/null
+++ b/src/backend/replication/basebackup_tar.c
@@ -0,0 +1,266 @@
+#include "postgres.h"
+
+#include "pgtar.h"
+#include "replication/basebackup_archiver.h"
+
+typedef struct bbarchiver_tar
+{
+ bbarchiver base;
+ bbsink *sink;
+ size_t file_len;
+} bbarchiver_tar;
+
+typedef struct bbarchiver_tarsize
+{
+ bbarchiver base;
+ tablespaceinfo *tsinfo;
+} bbarchiver_tarsize;
+
+static void bbarchiver_tar_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+static void bbarchiver_tar_end_tablespace(bbarchiver *archiver);
+static void bbarchiver_tar_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tar_file_contents(bbarchiver *archiver,
+ const char *data,
+ size_t len);
+static void bbarchiver_tar_end_file(bbarchiver *archiver);
+static void bbarchiver_tar_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tar_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+static void report_tar_error(enum tarError rc, const char *filename,
+ const char *linktarget);
+
+static void bbarchiver_tarsize_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+static void bbarchiver_tarsize_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tarsize_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+static void bbarchiver_tarsize_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+static void add_tar_size(bbarchiver *archiver, uint64 file_size);
+
+const bbarchiver_ops bbarchiver_tar_ops = {
+ .begin_tablespace = bbarchiver_tar_begin_tablespace,
+ .end_tablespace = bbarchiver_tar_end_tablespace,
+ .begin_file = bbarchiver_tar_begin_file,
+ .file_contents = bbarchiver_tar_file_contents,
+ .end_file = bbarchiver_tar_end_file,
+ .directory = bbarchiver_tar_directory,
+ .symbolic_link = bbarchiver_tar_symbolic_link,
+};
+
+const bbarchiver_ops bbarchiver_tarsize_ops = {
+ .begin_tablespace = bbarchiver_tarsize_begin_tablespace,
+ .end_tablespace = bbarchiver_noop_end_tablespace,
+ .begin_file = bbarchiver_tarsize_begin_file,
+ .file_contents = NULL,
+ .end_file = bbarchiver_noop_end_file,
+ .directory = bbarchiver_tarsize_directory,
+ .symbolic_link = bbarchiver_tarsize_symbolic_link,
+};
+
+bbarchiver *
+bbarchiver_tar_new(bbsink *sink)
+{
+ bbarchiver_tar *archiver = palloc0(sizeof(bbarchiver_tar));
+
+ *((const bbarchiver_ops **) &archiver->base.bba_ops) = &bbarchiver_tar_ops;
+ archiver->base.bba_next = NULL;
+ archiver->sink = sink;
+
+ return &archiver->base;
+}
+
+static void
+bbarchiver_tar_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char *archive_name = "base.tar";
+
+ if (tsinfo->path != NULL)
+ archive_name = psprintf("%s.tar", tsinfo->oid);
+
+ bbsink_begin_archive(myarchiver->sink, archive_name);
+}
+
+static void
+bbarchiver_tar_end_tablespace(bbarchiver *archiver)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+
+ bbsink_end_archive(myarchiver->sink);
+}
+
+static void
+bbarchiver_tar_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char h[TAR_BLOCK_SIZE];
+ enum tarError rc;
+
+ myarchiver->file_len = 0;
+
+ rc = tarCreateHeader(h, relative_path, NULL, statbuf->st_size,
+ statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ statbuf->st_mtime);
+ if (rc != TAR_OK)
+ report_tar_error(rc, relative_path, NULL);
+
+ bbsink_archive_contents(myarchiver->sink, h, sizeof(h));
+}
+
+static void
+bbarchiver_tar_file_contents(bbarchiver *archiver, const char *data,
+ size_t len)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+
+ myarchiver->file_len += len;
+ bbsink_archive_contents(myarchiver->sink, data, len);
+}
+
+static void
+bbarchiver_tar_end_file(bbarchiver *archiver)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ int pad;
+
+ /* Pad to a block boundary, per tar format requirements. */
+ pad = tarPaddingBytesRequired(myarchiver->file_len);
+ if (pad > 0)
+ {
+ char buf[TAR_BLOCK_SIZE];
+
+ MemSet(buf, 0, pad);
+ bbsink_archive_contents(myarchiver->sink, buf, pad);
+ }
+}
+
+static void
+bbarchiver_tar_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char h[TAR_BLOCK_SIZE];
+ enum tarError rc;
+
+ rc = tarCreateHeader(h, relative_path, NULL, statbuf->st_size,
+ statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ statbuf->st_mtime);
+ if (rc != TAR_OK)
+ report_tar_error(rc, relative_path, NULL);
+
+ bbsink_archive_contents(myarchiver->sink, h, sizeof(h));
+}
+
+static void
+bbarchiver_tar_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ bbarchiver_tar *myarchiver = (bbarchiver_tar *) archiver;
+ char h[TAR_BLOCK_SIZE];
+ enum tarError rc;
+
+ rc = tarCreateHeader(h, relative_path, linktarget, statbuf->st_size,
+ statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ statbuf->st_mtime);
+ if (rc != TAR_OK)
+ report_tar_error(rc, relative_path, NULL);
+
+ bbsink_archive_contents(myarchiver->sink, h, sizeof(h));
+}
+
+static void
+report_tar_error(enum tarError rc, const char *filename,
+ const char *linktarget)
+{
+ switch (rc)
+ {
+ case TAR_OK:
+ break;
+ case TAR_NAME_TOO_LONG:
+ ereport(ERROR,
+ (errmsg("file name too long for tar format: \"%s\"",
+ filename)));
+ break;
+ case TAR_SYMLINK_TOO_LONG:
+ Assert(linktarget != NULL);
+ ereport(ERROR,
+ (errmsg("symbolic link target too long for tar format: "
+ "file name \"%s\", target \"%s\"",
+ filename, linktarget)));
+ break;
+ default:
+ elog(ERROR, "unrecognized tar error: %d", rc);
+ }
+}
+
+/*
+ * Create an archiver that calculates an estimated size for a tar file built
+ * from the files visited.
+ */
+bbarchiver *
+bbarchiver_tarsize_new(void)
+{
+ bbarchiver_tarsize *archiver = palloc0(sizeof(bbarchiver_tarsize));
+
+ *((const bbarchiver_ops **) &archiver->base.bba_ops) =
+ &bbarchiver_tarsize_ops;
+ archiver->base.bba_next = NULL;
+
+ return &archiver->base;
+}
+
+static void
+bbarchiver_tarsize_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo)
+{
+ bbarchiver_tarsize *myarchiver = (bbarchiver_tarsize *) archiver;
+
+ myarchiver->tsinfo = tsinfo;
+ tsinfo->size = 0;
+}
+
+static void
+bbarchiver_tarsize_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ add_tar_size(archiver, statbuf->st_size);
+}
+
+static void
+bbarchiver_tarsize_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ add_tar_size(archiver, 0);
+}
+
+static void
+bbarchiver_tarsize_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf)
+{
+ add_tar_size(archiver, 0);
+}
+
+static void
+add_tar_size(bbarchiver *archiver, uint64 file_size)
+{
+ bbarchiver_tarsize *myarchiver = (bbarchiver_tarsize *) archiver;
+
+ myarchiver->tsinfo->size +=
+ TAR_BLOCK_SIZE + file_size + tarPaddingBytesRequired(file_size);
+}
--
2.24.3 (Apple Git-128)
v2-0007-Introduce-bbarchiver-abstraction.patchapplication/octet-stream; name=v2-0007-Introduce-bbarchiver-abstraction.patchDownload
From 9890dbd31017f8421b2ea33b22f4484c9a75ac86 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 16:43:16 -0400
Subject: [PATCH v2 07/11] Introduce bbarchiver abstraction.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup_archiver.c | 119 +++++++++++
src/include/replication/basebackup_archiver.h | 195 ++++++++++++++++++
3 files changed, 315 insertions(+)
create mode 100644 src/backend/replication/basebackup_archiver.c
create mode 100644 src/include/replication/basebackup_archiver.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 7de4f82882..aacccd350d 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_archiver.o \
basebackup_libpq.o \
basebackup_progress.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup_archiver.c b/src/backend/replication/basebackup_archiver.c
new file mode 100644
index 0000000000..045a8a088e
--- /dev/null
+++ b/src/backend/replication/basebackup_archiver.c
@@ -0,0 +1,119 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_archiver.c
+ * general supporting code for basebackup archiver implementations
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_archiver.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "replication/basebackup_archiver.h"
+
+/* Pass begin_tablespace callback to next bbarchiver. */
+void
+bbarchiver_forward_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_begin_tablespace(archiver->bba_next, tsinfo);
+}
+
+/* Pass end_tablespace callback to next bbarchiver. */
+void
+bbarchiver_forward_end_tablespace(bbarchiver *archiver)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_end_tablespace(archiver->bba_next);
+}
+
+/* Pass begin_file callback to next bbarchiver. */
+void
+bbarchiver_forward_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_begin_file(archiver->bba_next, relative_path, statbuf);
+}
+
+/* Pass file_contents callback to next bbarchiver. */
+void
+bbarchiver_forward_file_contents(bbarchiver *archiver, const char *data, size_t len)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_file_contents(archiver->bba_next, data, len);
+}
+
+/* Pass end_file callback to next bbarchiver. */
+void
+bbarchiver_forward_end_file(bbarchiver *archiver)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_end_file(archiver->bba_next);
+}
+
+/* Pass directory callback to next bbarchiver. */
+void
+bbarchiver_forward_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_directory(archiver->bba_next, relative_path, statbuf);
+}
+
+/* Pass symbolic_link callback to next bbarchiver. */
+void
+bbarchiver_forward_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ Assert(archiver->bba_next != NULL);
+ bbarchiver_symbolic_link(archiver->bba_next, relative_path, linktarget, statbuf);
+}
+
+/* Ignore begin_tablespace callback. */
+void
+bbarchiver_noop_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ /* Do nothing */
+}
+
+/* Ignore end_tablespace callback. */
+void
+bbarchiver_noop_end_tablespace(bbarchiver *archiver)
+{
+ /* Do nothing */
+}
+
+/* Ignore begin_file callback. */
+void
+bbarchiver_noop_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ /* Do nothing */
+}
+
+/* Ignore end_file callback. */
+void
+bbarchiver_noop_end_file(bbarchiver *archiver)
+{
+ /* Do nothing */
+}
+
+/* Ignore directory callback. */
+void
+bbarchiver_noop_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ /* Do nothing */
+}
+
+/* Ignore symbolic_link callback. */
+void
+bbarchiver_noop_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ /* Do nothing */
+}
diff --git a/src/include/replication/basebackup_archiver.h b/src/include/replication/basebackup_archiver.h
new file mode 100644
index 0000000000..fce0afa167
--- /dev/null
+++ b/src/include/replication/basebackup_archiver.h
@@ -0,0 +1,195 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbarchiver.h
+ * iterate over files, directories, and symbolic links encountered as
+ * part of the base backup process
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * src/include/replication/bbarchiver.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_ARCHIVER_H
+#define BASEBACKUP_ARCHIVER_H
+
+#include <sys/stat.h>
+
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+struct bbarchiver;
+struct bbarchiver_ops;
+typedef struct bbarchiver bbarchiver;
+typedef struct bbarchiver_ops bbarchiver_ops;
+
+/*
+ * Common data for any type of basebackup archiver.
+ *
+ * 'bba_ops' is the relevant callback table.
+ *
+ * 'bba_next' is a pointer to another bbarchiver to which this bbarchiver is
+ * forwarding some or all operations.
+ *
+ * If a barchiver needs to store additional state, it can allocate a larger
+ * structure whose first element is a bbarchiver.
+ */
+struct bbarchiver
+{
+ const bbarchiver_ops *bba_ops;
+ bbarchiver *bba_next;
+};
+
+/*
+ * Callbacks for a backup archiver.
+ *
+ * Except as otherwise noted, all of these callbacks are required. If a particular
+ * callback just needs to forward the call to archiver->bba_next, use
+ * bbarchiver_forward_<callback_name> as the callback. If a particular (required)
+ * callback doesn't need to do anything at all, use bbarchiver_noop_<callback_name>
+ * as the callback.
+ *
+ * Callers should always invoke these callbacks via the bbarchiver_*
+ * inline functions rather than calling them directly.
+ */
+struct bbarchiver_ops
+{
+ /* These callbacks are invoked just before and after visiting each tablespace. */
+ void (*begin_tablespace)(bbarchiver *archiver, tablespaceinfo *tsinfo);
+ void (*end_tablespace)(bbarchiver *archiver);
+
+ /* This callback is invoked each time we begin visiting a plain file. */
+ void (*begin_file)(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf);
+
+ /*
+ * This callback is invoked one or more times for each plain file, with the
+ * contents of the file passed to it chunk by chunk.
+ *
+ * It is optional. If NULL, the file is not read.
+ */
+ void (*file_contents)(bbarchiver *archiver, const char *data,
+ size_t len);
+
+ /* This callback is invoked each time we finish visiting a plain file. */
+ void (*end_file)(bbarchiver *archiver);
+
+ /* This method gets called each time we visit a directory. */
+ void (*directory)(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf);
+
+ /* This method gets called each time we visit a symbolic link. */
+ void (*symbolic_link)(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf);
+};
+
+/* Dummy callbacks for when a bbarchiver wants to forward operations. */
+extern void bbarchiver_forward_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+extern void bbarchiver_forward_end_tablespace(bbarchiver *archiver);
+extern void bbarchiver_forward_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+extern void bbarchiver_forward_file_contents(bbarchiver *archiver,
+ const char *data, size_t len);
+extern void bbarchiver_forward_end_file(bbarchiver *archiver);
+extern void bbarchiver_forward_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+extern void bbarchiver_forward_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+
+/* Dummy callbacks for when a bbarchiver wants to do nothing. */
+extern void bbarchiver_noop_begin_tablespace(bbarchiver *archiver,
+ tablespaceinfo *tsinfo);
+extern void bbarchiver_noop_end_tablespace(bbarchiver *archiver);
+extern void bbarchiver_noop_begin_file(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+/* if there's nothing to do for file contents, omit callback! */
+extern void bbarchiver_noop_end_file(bbarchiver *archiver);
+extern void bbarchiver_noop_directory(bbarchiver *archiver,
+ const char *relative_path,
+ struct stat *statbuf);
+extern void bbarchiver_noop_symbolic_link(bbarchiver *archiver,
+ const char *relative_path,
+ const char *linktarget,
+ struct stat *statbuf);
+
+/* Begin visiting a tablespace. */
+static inline void
+bbarchiver_begin_tablespace(bbarchiver *archiver, tablespaceinfo *tsinfo)
+{
+ Assert(archiver->bba_ops->begin_tablespace != NULL);
+ archiver->bba_ops->begin_tablespace(archiver, tsinfo);
+}
+
+/* Finish visiting a tablespace. */
+static inline void
+bbarchiver_end_tablespace(bbarchiver *archiver)
+{
+ Assert(archiver->bba_ops->end_tablespace != NULL);
+ archiver->bba_ops->end_tablespace(archiver);
+}
+
+/* Begin visiting a plain file. */
+static inline void
+bbarchiver_begin_file(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_ops->begin_file != NULL);
+ archiver->bba_ops->begin_file(archiver, relative_path, statbuf);
+}
+
+/* Does this archiver need the contents of the files? */
+static inline bool
+bbarchiver_needs_file_contents(bbarchiver *archiver)
+{
+ return archiver->bba_ops->file_contents != NULL;
+}
+
+/*
+ * Process contents of a plain file.
+ *
+ * Don't call this unless bbarchiver_needs_file_contents returns true.
+ */
+static inline void
+bbarchiver_file_contents(bbarchiver *archiver, const char *data, size_t len)
+{
+ Assert(archiver->bba_ops->file_contents != NULL);
+ archiver->bba_ops->file_contents(archiver, data, len);
+}
+
+/* Finish visiting a plain file. */
+static inline void
+bbarchiver_end_file(bbarchiver *archiver)
+{
+ Assert(archiver->bba_ops->end_file != NULL);
+ archiver->bba_ops->end_file(archiver);
+}
+
+/* Visit a directory. */
+static inline void
+bbarchiver_directory(bbarchiver *archiver, const char *relative_path,
+ struct stat *statbuf)
+{
+ Assert(archiver->bba_ops->directory != NULL);
+ archiver->bba_ops->directory(archiver, relative_path, statbuf);
+}
+
+/* Visit a symbolic link. */
+static inline void
+bbarchiver_symbolic_link(bbarchiver *archiver, const char *relative_path,
+ const char *linktarget, struct stat *statbuf)
+{
+ Assert(archiver->bba_ops->symbolic_link != NULL);
+ archiver->bba_ops->symbolic_link(archiver, relative_path, linktarget, statbuf);
+}
+
+/* Constructors for various types of archivers. */
+extern bbarchiver *bbarchiver_tar_new(bbsink *sink);
+extern bbarchiver *bbarchiver_tarsize_new(void);
+
+#endif
--
2.24.3 (Apple Git-128)
v2-0010-WIP-Introduce-bbstreamer-abstration-and-adapt-pg_.patchapplication/octet-stream; name=v2-0010-WIP-Introduce-bbstreamer-abstration-and-adapt-pg_.patchDownload
From 42a8ce6765297eeb855478ebb8f312ffd33056b5 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 28 Jul 2020 14:04:20 -0400
Subject: [PATCH v2 10/11] WIP: Introduce bbstreamer abstration and adapt
pg_basebackup to use it.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 ++++++
src/bin/pg_basebackup/bbstreamer_file.c | 571 ++++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 246 +++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 440 ++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 779 +++-------------------
6 files changed, 1568 insertions(+), 697 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 988007c6fd..68df5dd6e6 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -27,10 +27,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -53,7 +59,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..dd4ded3880
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content)(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize)(bbstreamer *streamer);
+ void (*free)(bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(char *basepath,
+ const char *(*link_map)(const char *),
+ void (*report_output_file)(const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..ca34871aef
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,571 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map)(const char *);
+ void (*report_output_file)(const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any symbolic
+ * link, and which should return a replacement pathname to be used in its place.
+ * If NULL, the symbolic link target is used without modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a new
+ * output file. The pathname to that file is passed as an argument. If NULL, the
+ * call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(char *basepath,
+ const char *(*link_map)(const char *),
+ void (*report_output_file)(const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = basepath;
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
+ * clusters) will have been created by the wal receiver
+ * process. Also, when the WAL directory location was
+ * specified, pg_wal (or pg_xlog) has already been created
+ * as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related
+ * directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ *
+ * We don't own any memory beyond the bbstreamer itself.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..3ec10811df
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,246 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf;
+ * on older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..6596f3a553
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,440 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /*
+ * If we're expecting an archive member header, accumulate
+ * a full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the
+ * file trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not
+ * the start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0645e983c6..8f64a0bdf9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -27,6 +27,7 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -60,34 +61,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -159,10 +135,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -188,13 +165,11 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force);
static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -357,21 +332,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -760,6 +720,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -769,7 +737,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* force parameter is set to true.
*/
static void
-progress_report(int tablespacenum, const char *filename, bool force)
+progress_report(int tablespacenum, bool force)
{
int percent;
char totaldone_str[32];
@@ -809,7 +777,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -825,7 +793,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -839,7 +807,7 @@ progress_report(int tablespacenum, const char *filename, bool force)
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -989,241 +957,94 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
static void
writeTarData(WriteTarState *state, char *buf, int r)
{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
+ bbstreamer_content(state->streamer, NULL, buf, r, BBSTREAMER_UNKNOWN);
}
/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
+ * Receive a tar format file from the connection to the server, and perform
+ * the appropriate processing. Depending on the selected output format, we
+ * may either write the results directly to a tar file, or compress first,
+ * or extract the tar file instead of writing it directly.
*
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
+ * If we write the data out to a tar file, it be named base.tar[.gz] if it's
+ * for the main data directory or <tablespaceoid>.tar[.gz] if it's for another
+ * tablespace.
*/
static void
ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
{
- char zerobuf[TAR_BLOCK_SIZE * 2];
+ FILE *archive_file = NULL;
WriteTarState state;
+ bool basetablespace;
+ bbstreamer *streamer;
+ char filename[MAXPGPATH];
memset(&state, 0, sizeof(state));
state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
-
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ basetablespace = PQgetisnull(res, rownum, 0);
- if (state.basetablespace)
+ if (format == 'p')
{
- /*
- * Base tablespaces
- */
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
+ char current_path[MAXPGPATH];
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
+ if (basetablespace)
+ strlcpy(current_path, basedir, sizeof(current_path));
else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ strlcpy(current_path,
+ get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
+ sizeof(current_path));
+
+ streamer = bbstreamer_extractor_new(current_path,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
- /*
- * Specific tablespace
- */
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (!basetablespace)
+ snprintf(filename, sizeof(filename),
+ "%s/%s.tar", basedir, PQgetvalue(res, rownum, 0));
+ else if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(filename, sizeof(filename), "-");
+ archive_file = stdout;
}
else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ snprintf(filename, sizeof(filename), "%s/base.tar", basedir);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(filename, ".gz", sizeof(filename));
+ streamer = bbstreamer_gzip_writer_new(filename, archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
-
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ streamer = bbstreamer_plain_writer_new(filename, archive_file);
- /*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
- */
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = filename;
+ }
- MemSet(zerobuf, 0, sizeof(zerobuf));
+ state.streamer = streamer;
- if (state.basetablespace && writerecoveryconf)
+ if (basetablespace && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ bool is_recovery_guc_supported;
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ is_recovery_guc_supported =
+ (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC);
+ state.streamer =
+ bbstreamer_recovery_injector_new(state.streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ state.streamer = bbstreamer_tar_parser_new(state.streamer);
- writeTarData(&state, header, sizeof(header));
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ progress_filename = NULL;
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1232,7 +1053,6 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
*/
if (strcmp(basedir, "-") == 0 && manifest)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
initPQExpBuffer(&buf);
@@ -1242,42 +1062,19 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
+
+ /*
+ * We inject into 'streamer' here, not 'state.streamer', so that we
+ * bypass the tar parser and recovery injector bbstreamer objects.
+ */
+ bbstreamer_inject_file(streamer, "backup_manifest", buf.data, buf.len);
termPQExpBuffer(&buf);
}
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
-
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
- }
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
- progress_report(rownum, state.filename, true);
+ progress_report(rownum, true);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1293,184 +1090,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
+ writeTarData(state, copybuf, r);
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
-
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false);
+ progress_report(state->tablespacenum, false);
}
@@ -1495,236 +1118,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2065,10 +1458,7 @@ BaseBackup(void)
*/
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
- else
- ReceiveAndUnpackTarFile(conn, res, i);
+ ReceiveTarFile(conn, res, i);
} /* Loop over all tablespaces */
/*
@@ -2086,7 +1476,8 @@ BaseBackup(void)
if (showprogress)
{
- progress_report(PQntuples(res), NULL, true);
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true);
if (isatty(fileno(stderr)))
fprintf(stderr, "\n"); /* Need to move to next line */
}
--
2.24.3 (Apple Git-128)
v2-0011-POC-Embarrassingly-bad-server-side-compression-pa.patchapplication/octet-stream; name=v2-0011-POC-Embarrassingly-bad-server-side-compression-pa.patchDownload
From 8b6cc6797b4e4ada0e76b1c516d72271e8865497 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 29 Jul 2020 09:24:37 -0400
Subject: [PATCH v2 11/11] POC: Embarrassingly bad server-side compression
patch.
---
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 2 +
src/backend/replication/basebackup_gzip.c | 166 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 8 +-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 177 insertions(+), 3 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 9706a95848..8ff63bc77e 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 6b3c77f2c0..20399c6349 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_archiver.o \
+ basebackup_gzip.o \
basebackup_libpq.o \
basebackup_progress.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 30e242f99e..0c473b02e7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -248,6 +248,8 @@ perform_base_backup(basebackup_options *opt)
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
+ sink = bbsink_gzip_new(sink, 1);
+
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..05bd497f38
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,166 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <zlib.h>
+
+#include "replication/basebackup_sink.h"
+
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Compression buffer. */
+ const char *buffer;
+} bbsink_gzip;
+
+#define COMPRESS_BUFFER_SIZE 65536
+
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink,
+ const char *data, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+}
+
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ mysink->buffer = palloc(COMPRESS_BUFFER_SIZE);
+ zs->next_out = (uint8 *) mysink->buffer;
+ zs->avail_out = COMPRESS_BUFFER_SIZE;
+
+ deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 31, 8,
+ Z_DEFAULT_STRATEGY);
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+static void
+bbsink_gzip_archive_contents(bbsink *sink, const char *data, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ zs->next_in = (uint8 *) data;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+ unsigned b4;
+
+ b4 = zs->avail_out;
+
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res != Z_OK)
+ elog(ERROR, "well that sucks");
+
+ if (zs->avail_out <= COMPRESS_BUFFER_SIZE / 2)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->buffer,
+ COMPRESS_BUFFER_SIZE - zs->avail_out);
+ zs->next_out = (uint8 *) mysink->buffer;
+ zs->avail_out = COMPRESS_BUFFER_SIZE;
+ }
+ }
+}
+
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+ int res;
+
+ Assert(zs->avail_in == 0);
+
+ elog(LOG, "bbsink_gzip_end_archive: reached");
+
+ do
+ {
+ res = deflate(zs, Z_FINISH);
+ if (res != Z_STREAM_END && res != Z_OK)
+ elog(ERROR, "this would also suck");
+ elog(LOG, "bbsink_gzip_end_archive: res = %d", (int) res);
+
+ if (zs->avail_out <= COMPRESS_BUFFER_SIZE)
+ {
+ elog(LOG, "finish-putting %u bytes",
+ COMPRESS_BUFFER_SIZE - zs->avail_out);
+ bbsink_archive_contents(sink->bbs_next, mysink->buffer,
+ COMPRESS_BUFFER_SIZE - zs->avail_out);
+ zs->next_out = (uint8 *) mysink->buffer;
+ zs->avail_out = COMPRESS_BUFFER_SIZE;
+ }
+ } while (res != Z_STREAM_END);
+
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f64a0bdf9..bd16bf65f1 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -978,6 +978,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
bool basetablespace;
bbstreamer *streamer;
char filename[MAXPGPATH];
+ bool need_tar_parser = false;
memset(&state, 0, sizeof(state));
state.tablespacenum = rownum;
@@ -997,6 +998,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
streamer = bbstreamer_extractor_new(current_path,
get_tablespace_mapping,
progress_update_filename);
+ need_tar_parser = true;
}
else
{
@@ -1022,7 +1024,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
#endif
streamer = bbstreamer_plain_writer_new(filename, archive_file);
- streamer = bbstreamer_tar_archiver_new(streamer);
+ //streamer = bbstreamer_tar_archiver_new(streamer);
progress_filename = filename;
}
@@ -1038,9 +1040,11 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
bbstreamer_recovery_injector_new(state.streamer,
is_recovery_guc_supported,
recoveryconfcontents);
+ need_tar_parser = true;
}
- state.streamer = bbstreamer_tar_parser_new(state.streamer);
+ if (need_tar_parser)
+ state.streamer = bbstreamer_tar_parser_new(state.streamer);
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index bf2d71fafa..24be3eb3fa 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -177,6 +177,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_libpq_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
/* Extra interface functions for progress reporting. */
extern void basebackup_progress_wait_checkpoint(void);
--
2.24.3 (Apple Git-128)
Hi,
On 2020-07-29 11:31:26 -0400, Robert Haas wrote:
Here's an updated patch set. This is now rebased over master and
includes as 0001 the patch I posted separately at
/messages/by-id/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
but drops some other patches that were committed meanwhile. 0002-0009
of this series are basically the same as 0004-0011 from the previous
series, except for rebasing and fixing a bug I discovered in what's
now 0006. 0012 does a refactoring of pg_basebackup along similar lines
to the server-side refactoring from patches earlier in the series.
Have you tested whether this still works against older servers? Or do
you think we should not have that as a goal?
1. pg_basebackup -R injects recovery.conf (on older versions) or
injects standby.signal and appends to postgresql.auto.conf (on newer
versions) by parsing the tar file sent by the server and editing it on
the fly. From the point of view of server-side compression, this is
not ideal, because if you want to make these kinds of changes when
server-side compression is in use, you'd have to decompress the stream
on the client side in order to figure out where in the steam you ought
to inject your changes. But having to do that is a major expense. If
the client instead told the server what to change when generating the
archive, and the server did it, this expense could be avoided. It
would have the additional advantage that the backup manifest could
reflect the effects of those changes; right now it doesn't, and
pg_verifybackup just knows to expect differences in those files.
Hm. I don't think I terribly like the idea of things like -R having to
be processed server side. That'll be awfully annoying to keep working
across versions, for one. But perhaps the config file should just not be
in the main tar file going forward?
I think we should eventually be able to use one archive for multiple
purposes, e.g. to set up a standby as well as using it for a base
backup. Or multiple standbys with different tablespace remappings.
2. According to the comments, some tar programs require two tar blocks
(i.e. 512-byte blocks) of zero bytes at the end of an archive. The
server does not generate these blocks of zero bytes, so it basically
creates a tar file that works fine with my copy of tar but might break
with somebody else's. Instead, the client appends 1024 zero bytes to
the end of every file it receives from the server. That is an odd way
of fixing this problem, and it makes things rather inflexible. If the
server sends you any kind of a file OTHER THAN a tar file with the
last 1024 zero bytes stripped off, then adding 1024 zero bytes will be
the wrong thing to do. It would be better if the server just generated
fully correct tar files (whatever we think that means) and the client
wrote out exactly what it got from the server. Then, we could have the
server generate cpio archives or zip files or gzip-compressed tar
files or lz4-compressed tar files or anything we like, and the client
wouldn't really need to care as long as it didn't need to extract
those archives. That seems a lot cleaner.
Yea.
5. As things stand today, the client must know exactly how many
archives it should expect to receive from the server and what each one
is. It can do that, because it knows to expect one archive per
tablespace, and the archive must be an uncompressed tarfile, so there
is no ambiguity. But, if the server could send archives to other
places, or send other kinds of archives to the client, then this would
become more complex. There is no intrinsic reason why the logic on the
client side can't simply be made more complicated in order to cope,
but it doesn't seem like great design, because then every time you
enhance the server, you've also got to enhance the client, and that
limits cross-version compatibility, and also seems more fragile. I
would rather that the server advertise the number of archives and the
names of each archive to the client explicitly, allowing the client to
be dumb unless it needs to post-process (e.g. extract) those archives.
ISTM that that can help to some degree, but things like tablespace
remapping etc IMO aren't best done server side, so I think the client
will continue to need to know about the contents to a significnat
degree?
Putting all of the above together, what I propose - but have not yet
tried to implement - is a new COPY sub-protocol for taking base
backups. Instead of sending a COPY stream per archive, the server
would send a single COPY stream where the first byte of each message
is a type indicator, like we do with the replication sub-protocol
today. For example, if the first byte is 'a' that could indicate that
we're beginning a new archive and the rest of the message would
indicate the archive name and perhaps some flags or options. If the
first byte is 'p' that could indicate that we're sending archive
payload, perhaps with the first four bytes of the message being
progress, i.e. the number of newly-processed bytes on the server side
prior to any compression, and the remaining bytes being payload. On
receipt of such a message, the client would increment the progress
indicator by the value indicated in those first four bytes, and then
process the remaining bytes by writing them to a file or whatever
behavior the user selected via -Fp, -Ft, -Z, etc.
Wonder if there's a way to get this to be less stateful. It seems a bit
ugly that the client would know what the last 'a' was for a 'p'? Perhaps
we could actually make 'a' include an identifier for each archive, and
then 'p' would append to a specific archive? Which would then also would
allow for concurrent processing of those archives on the server side.
I'd personally rather have a separate message type for progress and
payload. Seems odd to have to send payload messages with 0 payload just
because we want to update progress (in case of uploading to
e.g. S3). And I think it'd be nice if we could have a more extensible
progress measurement approach than a fixed length prefix. E.g. it might
be nice to allow it to report both the overall progress, as well as a
per archive progress. Or we might want to send progress when uploading
to S3, even when not having pre-calculated the total size of the data
directory.
Greetings,
Andres Freund
On Fri, Jul 31, 2020 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:
Have you tested whether this still works against older servers? Or do
you think we should not have that as a goal?
I haven't tested that recently but I intended to keep it working. I'll
make sure to nail that down before I get to the point of committing
anything, but I don't expect big problems. It's kind of annoying to
have so much backward compatibility stuff here but I think ripping any
of that out should wait for another time.
Hm. I don't think I terribly like the idea of things like -R having to
be processed server side. That'll be awfully annoying to keep working
across versions, for one. But perhaps the config file should just not be
in the main tar file going forward?
That'd be a user-visible change, though, whereas what I'm proposing
isn't. Instead of directly injecting stuff, the client can just send
it to the server and have the server inject it, provided the server is
new enough. Cross-version issues don't seem to be any worse than now.
That being said, I don't love it, either. We could just suggest to
people that using -R together with server compression is
I think we should eventually be able to use one archive for multiple
purposes, e.g. to set up a standby as well as using it for a base
backup. Or multiple standbys with different tablespace remappings.
I don't think I understand your point here.
ISTM that that can help to some degree, but things like tablespace
remapping etc IMO aren't best done server side, so I think the client
will continue to need to know about the contents to a significnat
degree?
If I'm not mistaken, those mappings are only applied with -Fp i.e. if
we're extracting. And it's no problem to jigger things in that case;
we can only do this if we understand the archive in the first place.
The problem is when you have to decompress and recompress to jigger
things.
Wonder if there's a way to get this to be less stateful. It seems a bit
ugly that the client would know what the last 'a' was for a 'p'? Perhaps
we could actually make 'a' include an identifier for each archive, and
then 'p' would append to a specific archive? Which would then also would
allow for concurrent processing of those archives on the server side.
...says the guy working on asynchronous I/O. I don't know, it's not a
bad idea, but I think we'd have to change a LOT of code to make it
actually do something useful. I feel like this could be added as a
later extension of the protocol, rather than being something that we
necessarily need to do now.
I'd personally rather have a separate message type for progress and
payload. Seems odd to have to send payload messages with 0 payload just
because we want to update progress (in case of uploading to
e.g. S3). And I think it'd be nice if we could have a more extensible
progress measurement approach than a fixed length prefix. E.g. it might
be nice to allow it to report both the overall progress, as well as a
per archive progress. Or we might want to send progress when uploading
to S3, even when not having pre-calculated the total size of the data
directory.
I don't mind a separate message type here, but if you want merging of
short messages with adjacent longer messages to generate a minimal
number of system calls, that might have some implications for the
other thread where we're talking about how to avoid extra memory
copies when generating protocol messages. If you don't mind them going
out as separate network packets, then it doesn't matter.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 29, 2020, at 8:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, May 8, 2020 at 4:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
So it might be good if I'd remembered to attach the patches. Let's try
that again.Here's an updated patch set.
Hi Robert,
v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies. It seems to be conflicting with Heikki's work from August. Could you rebase please?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Oct 21, 2020 at 12:14 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies. It seems to be conflicting with Heikki's work from August. Could you rebase please?
Here at last is a new version. I've dropped the "bbarchiver" patch for
now, added a new patch that I'll talk about below, and revised the
others. I'm pretty happy with the code now, so I guess the main things
that I'd like feedback on are (1) whether design changes seem to be
needed and (2) the UI. Once we have that stuff hammered out, I'll work
on adding documentation, which is missing at present. The interesting
patches in terms of functionality are 0006 and 0007; the rest is
preparatory refactoring.
0006 adds a concept of base backup "targets," which means that it lets
you send the base backup to someplace other than the client. You
specify the target using a new "-t" option to pg_basebackup. By way of
example, 0006 adds a "blackhole" target which throws the backup away
instead of sending it anywhere, and also a "server" target which
stores the backup to the server filesystem in lieu of streaming it to
the client. So you can say something like "pg_basebackup -Xnone -Ft -t
server:/backup/2021-07-08" and, provided that you're superuser, the
server will try to drop the backup there. At present, you can't use
-Fp or -Xfetch or -Xstream with a backup target, because that
functionality is implemented on the client side. I think that's an
acceptable restriction. Eventually I imagine we will want to have
targets like "aws" or "s3" or maybe some kind of plug-in system for
new targets. I haven't designed anything like that yet, but I think
it's probably not all that hard to generalize what I've got.
0007 adds server-side compression; currently, it only supports
server-side compression using gzip, but I hope that it won't be hard
to generalize that to support LZ4 as well, and Andres told me he
thinks we should aim to support zstd since that library has built-in
parallel compression which is very appealing in this context. So you
say something like "pg_basebackup -Ft --server-compression=gzip -D
/backup/2021-07-08" or, if you want that compressed backup stored on
the server and compressed as hard as possible, you could say
"pg_basebackup -Xnone -Ft --server-compression=gzip9 -t
server:/backup/2021-07-08". Unfortunately, here again there are a
number of features that are implemented on the client side, and they
don't work in combination with this. -Fp could be made to work by
teaching the client to decompress; I just haven't written the code to
do that. It's probably not very useful in general, but maybe there's a
use case if you're really tight on network bandwidth. Making -R work
looks outright useless, because the client would have to get the whole
compressed tarfile from the server and then uncompress it, edit the
tar file, and recompress. That seems like a thing no one can possibly
want. Also, if you say pg_basebackup -Ft -D- >whatever.tar, the server
injects the backup manifest into the tarfile, which if you used
--server-compression would require decompressing and recompressing the
whole thing, so it doesn't seem worth supporting. It's more likely to
be a footgun than to help anybody. This option can be used with
-Xstream or -Xfetch, but it doesn't compress pg_wal.tar, because
that's generated on the client side.
The thing I'm really unhappy with here is the -F option to
pg_basebackup, which presently allows only p for plain or t for tar.
For purposes of these patches, I've essentially treated this as if -Fp
means "I want the tar files the server sends to be extracted" and
"-Ft" as if it means "I'm happy with them the way they are." Under
that interpretation, it's fine for --server-compression to cause e.g.
base.tar.gz to be written, because that's what the server sent. But
it's not really a "tar" output format; it's a "tar.gz" output format.
However, it doesn't seem to make any sense to define -Fz to mean "i
want tar.gz output" because -Z or -z already produces tar.gz output
when used with -Ft, and also because it would be redundant to make
people specify both -Fz and --server-compression. Similarly, when you
use --target, the output format is arguably, well, nothing. I mean,
some tar files got stored to the target, but you don't have them, but
again it seems redundant to have people specify --target and then also
have to change the argument to -F. Hindsight being 20-20, I think we
would have been better off not having a -Ft or -Fp option at all, and
having an --extract option that says you want to extract what the
server sends you, but it's probably too late to make that change now.
Or maybe it isn't, and we should just break command-line argument
compatibility for v15. I don't know. Opinions appreciated, especially
if they are nuanced.
If you're curious about what the other patches in the series do,
here's a very fast recap; see commit messages for more. 0001 revises
the grammar for some replication commands to use an extensible-options
syntax. 0002 is a trivial refactoring of basebackup.c. 0003 and 0004
refactor the server's basebackup.c and the client's pg_basebackup.c,
respectively, by introducing abstractions called bbsink and
bbstreamer. 0005 introduces a new COPY sub-protocol for taking base
backups. I think it's worth mentioning that I believe that this
refactoring is quite powerful and could let us do a bunch of other
things that this patch set doesn't attempt. For instance, since this
makes it pretty easy to implement server-side compression, it could
probably also pretty easily be made to do server-side encryption, if
you're brave enough to want to have a discussion on pgsql-hackers
about how to design an encryption feature.
Thanks to my colleague Tushar Ahuja for helping test some of this code.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v3-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchapplication/octet-stream; name=v3-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchDownload
From 51e9982701eee197a9c6ae54830e4d6683f8a032 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 11:33:32 -0400
Subject: [PATCH v3 1/7] Flexible options for BASE_BACKUP and
CREATE_REPLICATION_SLOT.
Previously, these replication commands used an entirely hard-coded
syntax, but that's hard to extend. Instead, adopt the same kind of
syntax we've used for SQL commands such as VACUUM, ANALYZE, COPY,
and EXPLAIN, where it's not necessary for all of the option names
to be parser keywords.
This commit does not remove support for the old syntax. It just
adds the new one as an additional option, and makes pg_basebackup
prefer the new syntax when the server is new enough to support it.
v2: Fix compile error.
v3: Fix inverted test, as reported by Tushar Ahuja.
v4: Adjustments for v15.
---
src/backend/replication/basebackup.c | 33 ++---
.../libpqwalreceiver/libpqwalreceiver.c | 8 +-
src/backend/replication/repl_gram.y | 116 +++++++++++++++---
src/backend/replication/walsender.c | 17 +--
src/bin/pg_basebackup/pg_basebackup.c | 65 ++++++----
src/bin/pg_basebackup/streamutil.c | 102 +++++++++++++--
src/bin/pg_basebackup/streamutil.h | 12 ++
7 files changed, 273 insertions(+), 80 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e09108d0ec..b0b52d3b1a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "catalog/pg_type.h"
#include "common/file_perm.h"
+#include "commands/defrem.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "libpq/libpq.h"
@@ -787,7 +788,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->label = strVal(defel->arg);
+ opt->label = defGetString(defel);
o_label = true;
}
else if (strcmp(defel->defname, "progress") == 0)
@@ -796,7 +797,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->progress = true;
+ opt->progress = defGetBoolean(defel);
o_progress = true;
}
else if (strcmp(defel->defname, "fast") == 0)
@@ -805,16 +806,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->fastcheckpoint = true;
+ opt->fastcheckpoint = defGetBoolean(defel);
o_fast = true;
}
- else if (strcmp(defel->defname, "nowait") == 0)
+ else if (strcmp(defel->defname, "wait") == 0)
{
if (o_nowait)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->nowait = true;
+ opt->nowait = !defGetBoolean(defel);
o_nowait = true;
}
else if (strcmp(defel->defname, "wal") == 0)
@@ -823,19 +824,19 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->includewal = true;
+ opt->includewal = defGetBoolean(defel);
o_wal = true;
}
else if (strcmp(defel->defname, "max_rate") == 0)
{
- long maxrate;
+ int64 maxrate;
if (o_maxrate)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- maxrate = intVal(defel->arg);
+ maxrate = defGetInt64(defel);
if (maxrate < MAX_RATE_LOWER || maxrate > MAX_RATE_UPPER)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
@@ -851,21 +852,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->sendtblspcmapfile = true;
+ opt->sendtblspcmapfile = defGetBoolean(defel);
o_tablespace_map = true;
}
- else if (strcmp(defel->defname, "noverify_checksums") == 0)
+ else if (strcmp(defel->defname, "verify_checksums") == 0)
{
if (o_noverify_checksums)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- noverify_checksums = true;
+ noverify_checksums = !defGetBoolean(defel);
o_noverify_checksums = true;
}
else if (strcmp(defel->defname, "manifest") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
bool manifest_bool;
if (o_manifest)
@@ -890,7 +891,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "manifest_checksums") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
if (o_manifest_checksums)
ereport(ERROR,
@@ -905,8 +906,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
o_manifest_checksums = true;
}
else
- elog(ERROR, "option \"%s\" not recognized",
- defel->defname);
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname));
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6eaa84a031..37a0d1c79a 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -871,19 +871,19 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
switch (snapshot_action)
{
case CRS_EXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " EXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, " (EXPORT_SNAPSHOT TRUE)");
break;
case CRS_NOEXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " NOEXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, " (EXPORT_SNAPSHOT FALSE)");
break;
case CRS_USE_SNAPSHOT:
- appendStringInfoString(&cmd, " USE_SNAPSHOT");
+ appendStringInfoString(&cmd, " (USE_SNAPSHOT)");
break;
}
}
else
{
- appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+ appendStringInfoString(&cmd, " PHYSICAL (RESERVE_WAL)");
}
res = libpqrcv_PQexec(conn->streamConn, cmd.data);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..69e990cda3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -95,16 +95,16 @@ static SQLCmd *make_sqlcmd(void);
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_legacy_opt_list generic_option_list
+%type <defelt> base_backup_legacy_opt generic_option
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
%type <node> plugin_opt_arg
-%type <str> opt_slot var_name
+%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
-%type <list> create_slot_opt_list
-%type <defelt> create_slot_opt
+%type <list> create_slot_options create_slot_legacy_opt_list
+%type <defelt> create_slot_legacy_opt
%%
@@ -157,12 +157,24 @@ var_name: IDENT { $$ = $1; }
;
/*
+ * BASE_BACKUP ( option [ 'value' ] [, ...] )
+ *
+ * We also still support the legacy syntax:
+ *
* BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
* [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
* [MANIFEST %s] [MANIFEST_CHECKSUMS %s]
+ *
+ * Future options should be supported only using the new syntax.
*/
base_backup:
- K_BASE_BACKUP base_backup_opt_list
+ K_BASE_BACKUP '(' generic_option_list ')'
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ $$ = (Node *) cmd;
+ }
+ | K_BASE_BACKUP base_backup_legacy_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
@@ -170,14 +182,14 @@ base_backup:
}
;
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
+base_backup_legacy_opt_list:
+ base_backup_legacy_opt_list base_backup_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-base_backup_opt:
+base_backup_legacy_opt:
K_LABEL SCONST
{
$$ = makeDefElem("label",
@@ -200,8 +212,8 @@ base_backup_opt:
}
| K_NOWAIT
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("wait",
+ (Node *)makeInteger(false), -1);
}
| K_MAX_RATE UCONST
{
@@ -215,8 +227,8 @@ base_backup_opt:
}
| K_NOVERIFY_CHECKSUMS
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("verify_checksums",
+ (Node *)makeInteger(false), -1);
}
| K_MANIFEST SCONST
{
@@ -231,8 +243,8 @@ base_backup_opt:
;
create_replication_slot:
- /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
- K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL [options] */
+ K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -242,8 +254,8 @@ create_replication_slot:
cmd->options = $5;
$$ = (Node *) cmd;
}
- /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
- | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin [options] */
+ | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -256,14 +268,19 @@ create_replication_slot:
}
;
-create_slot_opt_list:
- create_slot_opt_list create_slot_opt
+create_slot_options:
+ '(' generic_option_list ')' { $$ = $2; }
+ | create_slot_legacy_opt_list { $$ = $1; }
+ ;
+
+create_slot_legacy_opt_list:
+ create_slot_legacy_opt_list create_slot_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-create_slot_opt:
+create_slot_legacy_opt:
K_EXPORT_SNAPSHOT
{
$$ = makeDefElem("export_snapshot",
@@ -422,6 +439,65 @@ plugin_opt_arg:
sql_cmd:
IDENT { $$ = (Node *) make_sqlcmd(); }
;
+
+generic_option_list:
+ generic_option_list ',' generic_option
+ { $$ = lappend($1, $3); }
+ | generic_option
+ { $$ = list_make1($1); }
+ ;
+
+generic_option:
+ ident_or_keyword
+ {
+ $$ = makeDefElem($1, NULL, -1);
+ }
+ | ident_or_keyword IDENT
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword SCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword UCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeInteger($2), -1);
+ }
+ ;
+
+ident_or_keyword:
+ IDENT { $$ = $1; }
+ | K_BASE_BACKUP { $$ = "base_backup"; }
+ | K_IDENTIFY_SYSTEM { $$ = "identify_system"; }
+ | K_SHOW { $$ = "show"; }
+ | K_START_REPLICATION { $$ = "start_replication"; }
+ | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
+ | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
+ | K_TIMELINE_HISTORY { $$ = "timeline_history"; }
+ | K_LABEL { $$ = "label"; }
+ | K_PROGRESS { $$ = "progress"; }
+ | K_FAST { $$ = "fast"; }
+ | K_WAIT { $$ = "wait"; }
+ | K_NOWAIT { $$ = "nowait"; }
+ | K_MAX_RATE { $$ = "max_rate"; }
+ | K_WAL { $$ = "wal"; }
+ | K_TABLESPACE_MAP { $$ = "tablespace_map"; }
+ | K_NOVERIFY_CHECKSUMS { $$ = "noverify_checksums"; }
+ | K_TIMELINE { $$ = "timeline"; }
+ | K_PHYSICAL { $$ = "physical"; }
+ | K_LOGICAL { $$ = "logical"; }
+ | K_SLOT { $$ = "slot"; }
+ | K_RESERVE_WAL { $$ = "reserve_wal"; }
+ | K_TEMPORARY { $$ = "temporary"; }
+ | K_TWO_PHASE { $$ = "two_phase"; }
+ | K_EXPORT_SNAPSHOT { $$ = "export_snapshot"; }
+ | K_NOEXPORT_SNAPSHOT { $$ = "noexport_snapshot"; }
+ | K_USE_SNAPSHOT { $$ = "use_snapshot"; }
+ | K_MANIFEST { $$ = "manifest"; }
+ | K_MANIFEST_CHECKSUMS { $$ = "manifest_checksums"; }
+ ;
+
%%
static SQLCmd *
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 92c755f346..93f2d0ece4 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -895,7 +895,8 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
snapshot_action_given = true;
- *snapshot_action = CRS_USE_SNAPSHOT;
+ if (defGetBoolean(defel))
+ *snapshot_action = CRS_USE_SNAPSHOT;
}
else if (strcmp(defel->defname, "reserve_wal") == 0)
{
@@ -905,7 +906,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
reserve_wal_given = true;
- *reserve_wal = true;
+ *reserve_wal = defGetBoolean(defel);
}
else if (strcmp(defel->defname, "two_phase") == 0)
{
@@ -914,7 +915,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
two_phase_given = true;
- *two_phase = true;
+ *two_phase = defGetBoolean(defel);
}
else
elog(ERROR, "unrecognized option: %s", defel->defname);
@@ -984,7 +985,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... EXPORT_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (EXPORT_SNAPSHOT)")));
need_full_snapshot = true;
}
@@ -994,25 +995,25 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (XactIsoLevel != XACT_REPEATABLE_READ)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called in REPEATABLE READ isolation mode transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (FirstSnapshotSet)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called before any query",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (IsSubTransaction())
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called in a subtransaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
need_full_snapshot = true;
}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8bb0acf498..80912a0ea6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1801,10 +1801,6 @@ BaseBackup(void)
TimeLineID latesttli;
TimeLineID starttli;
char *basebkp;
- char escaped_label[MAXPGPATH];
- char *maxrate_clause = NULL;
- char *manifest_clause = NULL;
- char *manifest_checksums_clause = "";
int i;
char xlogstart[64];
char xlogend[64];
@@ -1813,8 +1809,11 @@ BaseBackup(void)
int serverVersion,
serverMajor;
int writing_to_stdout;
+ bool use_new_option_syntax = false;
+ PQExpBufferData buf;
Assert(conn != NULL);
+ initPQExpBuffer(&buf);
/*
* Check server version. BASE_BACKUP command was introduced in 9.1, so we
@@ -1832,6 +1831,8 @@ BaseBackup(void)
serverver ? serverver : "'unknown'");
exit(1);
}
+ if (serverMajor >= 1500)
+ use_new_option_syntax = true;
/*
* If WAL streaming was requested, also check that the server is new
@@ -1862,20 +1863,42 @@ BaseBackup(void)
/*
* Start the actual backup
*/
- PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);
-
+ AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
+ if (estimatesize)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");
+ if (includewal == FETCH_WAL)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");
+ if (fastcheckpoint)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "FAST");
+ if (includewal != NO_WAL)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "WAIT", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "NOWAIT");
+ }
if (maxrate > 0)
- maxrate_clause = psprintf("MAX_RATE %u", maxrate);
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
+ maxrate);
+ if (format == 't')
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+ if (!verify_checksums)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax,
+ "VERIFY_CHECKSUMS", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax,
+ "NOVERIFY_CHECKSUMS");
+ }
if (manifest)
{
- if (manifest_force_encode)
- manifest_clause = "MANIFEST 'force-encode'";
- else
- manifest_clause = "MANIFEST 'yes'";
+ AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
- manifest_checksums_clause = psprintf("MANIFEST_CHECKSUMS '%s'",
- manifest_checksums);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
if (verbose)
@@ -1890,18 +1913,10 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s %s",
- escaped_label,
- estimatesize ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
- manifest_clause ? manifest_clause : "",
- manifest_checksums_clause);
+ if (use_new_option_syntax && buf.len > 0)
+ basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
+ else
+ basebkp = psprintf("BASE_BACKUP %s", buf.data);
if (PQsendQuery(conn, basebkp) == 0)
{
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..9232ef77e2 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -490,6 +490,7 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
{
PQExpBuffer query;
PGresult *res;
+ bool use_new_option_syntax = (PQserverVersion(conn) >= 150000);
query = createPQExpBuffer();
@@ -498,27 +499,51 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
Assert(!(two_phase && is_physical));
Assert(slot_name != NULL);
- /* Build query */
+ /* Build base portion of query */
appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
if (is_temporary)
appendPQExpBufferStr(query, " TEMPORARY");
if (is_physical)
- {
appendPQExpBufferStr(query, " PHYSICAL");
+ else
+ appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
+
+ /* Add any requested options */
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(query, " (");
+ if (is_physical)
+ {
if (reserve_wal)
- appendPQExpBufferStr(query, " RESERVE_WAL");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "RESERVE_WAL");
}
else
{
- appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
if (two_phase && PQserverVersion(conn) >= 150000)
- appendPQExpBufferStr(query, " TWO_PHASE");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "TWO_PHASE");
- if (PQserverVersion(conn) >= 100000)
- /* pg_recvlogical doesn't use an exported snapshot, so suppress */
- appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ /* pg_recvlogical doesn't use an exported snapshot, so suppress */
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(query, use_new_option_syntax,
+ "EXPORT_SNAPSHOT", 0);
+ else
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "NOEXPORT_SNAPSHOT");
+ }
+ if (use_new_option_syntax)
+ {
+ /* Suppress option list if it would be empty, otherwise terminate */
+ if (query->data[query->len - 1] == '(')
+ {
+ query->len -= 2;
+ query->data[query->len] = '\0';
+ }
+ else
+ appendPQExpBufferChar(query, ')');
}
+ /* Now run the query */
res = PQexec(conn, query->data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
@@ -603,6 +628,67 @@ DropReplicationSlot(PGconn *conn, const char *slot_name)
return true;
}
+/*
+ * Append a "plain" option - one with no value - to a server command that
+ * is being constructed.
+ *
+ * In the old syntax, all options were parser keywords, so you could just
+ * write things like SOME_COMMAND OPTION1 OPTION2 'opt2value' OPTION3 42. The
+ * new syntax uses a comma-separated list surrounded by parentheses, so the
+ * equivalent is SOME_COMMAND (OPTION1, OPTION2 'optvalue', OPTION3 42).
+ */
+void
+AppendPlainCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name)
+{
+ if (buf->len > 0 && buf->data[buf->len - 1] != '(')
+ {
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(buf, ", ");
+ else
+ appendPQExpBufferChar(buf, ' ');
+ }
+
+ appendPQExpBuffer(buf, " %s", option_name);
+}
+
+/*
+ * Append an option with an associated string value to a server command that
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendStringCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, char *option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ if (option_value != NULL)
+ {
+ size_t length = strlen(option_value);
+ char *escaped_value = palloc(1 + 2 * length);
+
+ PQescapeStringConn(conn, escaped_value, option_value, length, NULL);
+ appendPQExpBuffer(buf, " '%s'", escaped_value);
+ pfree(escaped_value);
+ }
+}
+
+/*
+ * Append an option with an associated integer value to a server command
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendIntegerCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, int32 option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ appendPQExpBuffer(buf, " %d", option_value);
+}
/*
* Frontend version of GetCurrentTimestamp(), since we are not linked with
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..65135c79e0 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -15,6 +15,7 @@
#include "access/xlogdefs.h"
#include "datatype/timestamp.h"
#include "libpq-fe.h"
+#include "pqexpbuffer.h"
extern const char *progname;
extern char *connection_string;
@@ -40,6 +41,17 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
TimeLineID *starttli,
XLogRecPtr *startpos,
char **db_name);
+
+extern void AppendPlainCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_value);
+extern void AppendStringCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, char *option_value);
+extern void AppendIntegerCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, int32 option_value);
+
extern bool RetrieveWalSegSize(PGconn *conn);
extern TimestampTz feGetCurrentTimestamp(void);
extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
--
2.24.3 (Apple Git-128)
v3-0006-Support-base-backup-targets.patchapplication/octet-stream; name=v3-0006-Support-base-backup-targets.patchDownload
From a865b14bf535e7977b7326c643221df4602bc505 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 14:56:52 -0400
Subject: [PATCH v3 6/7] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 305 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 197 ++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 560 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 360453fad2..5faabe86a0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -45,8 +45,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -59,6 +61,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -243,14 +246,38 @@ perform_base_backup(basebackup_options *opt)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt->target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt->target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt->target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt->target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -701,6 +728,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -836,25 +865,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -866,6 +905,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index e0fe6a8cfa..90ddcff6d3 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -137,13 +140,14 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
sink->base.bbs_next = NULL;
sink->base.bbs_buffer_length = COPY_BUFFER_LENGTH;
+ sink->send_to_client = send_to_client;
/* Allow space for leading type byte, and initialize the type byte. */
sink->msgbuffer = palloc(COPY_BUFFER_LENGTH + 1);
@@ -204,8 +208,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -286,8 +294,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..714e9a695c
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,305 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Since we're not changing the data, we don't need our own buffer. */
+ sink->base.bbs_buffer = next->bbs_buffer;
+ sink->base.bbs_buffer_length = next->bbs_buffer_length;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index 69c80579cb..2f68b1ecb5 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -125,7 +125,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index ef7e6bfb77..a910915ccd 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 301e07e478..585da04ec9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -107,7 +107,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -124,6 +124,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -355,6 +356,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1219,15 +1222,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1299,24 +1309,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1681,7 +1699,33 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1776,8 +1820,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1788,7 +1837,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1871,7 +1921,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2005,8 +2055,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2028,7 +2081,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2062,6 +2115,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2112,7 +2166,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2153,6 +2207,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2289,18 +2346,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2310,6 +2399,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2326,6 +2425,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2359,8 +2461,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2381,6 +2493,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2388,6 +2501,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2397,6 +2513,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2439,11 +2558,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index ac3f4de57b..bf40ff3b64 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -255,9 +255,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6007827b44..6af924b6d4 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0fc7b7739a..a4da16f483 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3765,6 +3765,7 @@ backup_target_type
bbsink
bbsink_copystream
bbsink_ops
+bbsink_server
bbsink_state
bbsink_throttle
bbstreamer
--
2.24.3 (Apple Git-128)
v3-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v3-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From d3be9d355e9f0ede7e218724846cb07ef4ed8e68 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 13:15:39 -0400
Subject: [PATCH v3 5/7] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to suppor the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 62 ++-
src/backend/replication/basebackup_copy.c | 256 ++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
src/tools/pgindent/typedefs.list | 3 +
5 files changed, 712 insertions(+), 53 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1ebab942be..360453fad2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -43,6 +43,12 @@
#include "utils/resowner.h"
#include "utils/timestamp.h"
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -52,6 +58,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -71,6 +78,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -223,7 +231,7 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- bbsink *sink = bbsink_copytblspc_new();
+ bbsink *sink;
bbsink *progress_sink;
/* Initial backup state, insofar as we know it now. */
@@ -233,6 +241,16 @@ perform_base_backup(basebackup_options *opt)
state.bytes_total = 0;
state.bytes_total_is_valid = false;
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -373,7 +391,10 @@ perform_base_backup(basebackup_options *opt)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(progress_sink);
@@ -611,6 +632,7 @@ perform_base_backup(basebackup_options *opt)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -678,8 +700,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -810,6 +834,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -1662,6 +1702,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 5541334458..e0fe6a8cfa 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,51 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
/*
* How much data do we want to send in one CopyData message? Note that
@@ -47,6 +111,17 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -58,6 +133,183 @@ const bbsink_ops bbsink_copytblspc_ops = {
.end_backup = bbsink_copytblspc_end_backup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->base.bbs_next = NULL;
+ sink->base.bbs_buffer_length = COPY_BUFFER_LENGTH;
+
+ /* Allow space for leading type byte, and initialize the type byte. */
+ sink->msgbuffer = palloc(COPY_BUFFER_LENGTH + 1);
+ sink->base.bbs_buffer = sink->msgbuffer + 1;
+ sink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fe5462ee54..301e07e478 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -52,6 +52,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -165,6 +175,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -981,10 +998,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1011,8 +1029,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1052,16 +1070,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1072,8 +1090,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1083,6 +1101,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1330,28 +1659,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1474,46 +1807,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index fc87071ef3..ac3f4de57b 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -255,6 +255,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 037032f85d..0fc7b7739a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3760,7 +3760,10 @@ yyscan_t
z_stream
z_streamp
zic_t
+ArchiveStreamState
+backup_target_type
bbsink
+bbsink_copystream
bbsink_ops
bbsink_state
bbsink_throttle
--
2.24.3 (Apple Git-128)
v3-0007-WIP-Server-side-gzip-compression.patchapplication/octet-stream; name=v3-0007-WIP-Server-side-gzip-compression.patchDownload
From 24e1aaba076d5e7b81d4d30c604fdd7a006bc4d1 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 8 Jul 2021 11:07:04 -0400
Subject: [PATCH v3 7/7] WIP: Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
---
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 295 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 38 ++-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 374 insertions(+), 2 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5faabe86a0..ec50fbab12 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -51,6 +51,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -63,6 +69,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -282,6 +290,10 @@ perform_base_backup(basebackup_options *opt)
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt->compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt->compression_level);
+
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -730,11 +742,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str;
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -894,6 +908,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..e9ae50ac9b
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,295 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ /*
+ * We need our own buffer, because we're going to pass different data
+ * to the next sink than what gets passed to us.
+ *
+ * We could try making the input buffer bigger than the output buffer,
+ * because we expect that compression is going to shrink the input data.
+ * However, we the compresion ratio could be quite high (>10x) and to take
+ * full advantage of this we would need a huge input buffer. Instead
+ * it seems better to assume the input buffer may be filled multiple times
+ * before we succeed in filling the output buffer, and keep the input
+ * buffer relatively small. For now we just make it the same size as the
+ * output buffer.
+ */
+ sink->base.bbs_buffer_length = next->bbs_buffer_length;
+ sink->base.bbs_buffer = palloc(sink->base.bbs_buffer_length);
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 585da04ec9..295cb8d53e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -131,6 +131,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -990,7 +991,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -999,14 +1002,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1729,6 +1750,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2139,6 +2171,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2322,6 +2355,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index bf40ff3b64..9236642d93 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -257,6 +257,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v3-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v3-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload
From f3071e5dfee7ba4d0b444ee1cedab59a35788979 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 12:00:34 -0400
Subject: [PATCH v3 4/7] Introduce 'bbstreamer' abstraction to modularize
pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 ++++++
src/bin/pg_basebackup/bbstreamer_file.c | 573 ++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 906 +++++-----------------
src/tools/pgindent/typedefs.list | 10 +
7 files changed, 1691 insertions(+), 721 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 66e0070f1a..f693b75576 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -30,10 +30,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -56,7 +62,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..b24dc848c1
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content) (bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize) (bbstreamer *streamer);
+ void (*free) (bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..0b0ada9736
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,573 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include <unistd.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map) (const char *);
+ void (*report_output_file) (const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any
+ * symbolic link, and which should return a replacement pathname to be used
+ * in its place. If NULL, the symbolic link target is used without
+ * modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a
+ * new output file. The pathname to that file is passed as an argument. If
+ * NULL, the call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = pstrdup(basepath);
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6 clusters) will
+ * have been created by the wal receiver process. Also, when the WAL
+ * directory location was specified, pg_wal (or pg_xlog) has already
+ * been created as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ pfree(mystreamer->basepath);
+ pfree(mystreamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..4d15251fdc
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf; on
+ * older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..5a9f587dca
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,444 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+
+ /*
+ * If we're expecting an archive member header, accumulate a
+ * full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the file
+ * trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not the
+ * start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 80912a0ea6..fe5462ee54 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -27,17 +27,12 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
-#include "common/string.h"
#include "fe_utils/recovery_gen.h"
-#include "fe_utils/string_utils.h"
#include "getopt_long.h"
-#include "libpq-fe.h"
-#include "pgtar.h"
-#include "pgtime.h"
-#include "pqexpbuffer.h"
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
@@ -60,34 +55,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -159,10 +129,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -188,14 +159,15 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force,
- bool finished);
-
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force, bool finished);
+
+static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported);
+static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -358,21 +330,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -761,6 +718,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -773,8 +738,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* is moved to the next line.
*/
static void
-progress_report(int tablespacenum, const char *filename,
- bool force, bool finished)
+progress_report(int tablespacenum, bool force, bool finished)
{
int percent;
char totaldone_str[32];
@@ -814,7 +778,7 @@ progress_report(int tablespacenum, const char *filename,
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -830,7 +794,7 @@ progress_report(int tablespacenum, const char *filename,
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -844,7 +808,7 @@ progress_report(int tablespacenum, const char *filename,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -990,257 +954,170 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
}
/*
- * Write a piece of tar data
- */
-static void
-writeTarData(WriteTarState *state, char *buf, int r)
-{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-}
-
-/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
- *
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
+ * Figure out what to do with an archive received from the server based on
+ * the options selected by the user. We may just write the results directly
+ * to a file, or we might compress first, or we might extract the tar file
+ * and write each member separately. This function doesn't do any of that
+ * directly, but it works out what kind of bbstreamer we need to create so
+ * that the right stuff happens when, down the road, we actually receive
+ * the data.
*/
-static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
+static bbstreamer *
+CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported)
{
- char zerobuf[TAR_BLOCK_SIZE * 2];
- WriteTarState state;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer = NULL;
+ bool inject_manifest;
+ bool must_parse_archive;
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
+ /*
+ * Normally, we emit the backup manifest as a separate file, but when
+ * we're writing a tarfile to stdout, we don't have that option, so
+ * include it in the one tarfile we've got.
+ */
+ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ /*
+ * We have to parse the archive if (1) we're suppose to extract it, or if
+ * (2) we need to inject backup_manifest or recovery configuration into it.
+ */
+ must_parse_archive = (format == 'p' || inject_manifest ||
+ (spclocation == NULL && writerecoveryconf));
- if (state.basetablespace)
+ if (format == 'p')
{
+ const char *directory;
+
/*
- * Base tablespaces
+ * In plain format, we must extract the archive. The data for the main
+ * tablespace will be written to the base directory, and the data for
+ * other tablespaces will be written to the directory where they're
+ * located on the server, after applying any user-specified tablespace
+ * mappings.
*/
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
-
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
- else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ directory = spclocation == NULL ? basedir
+ : get_tablespace_mapping(spclocation);
+ streamer = bbstreamer_extractor_new(directory,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
+ FILE *archive_file;
+ char archive_filename[MAXPGPATH];
+
/*
- * Specific tablespace
+ * In tar format, we just write the archive without extracting it.
+ * Normally, we write it to the archive name provided by the caller,
+ * but when the base directory is "-" that means we need to write
+ * to standard output.
*/
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(archive_filename, sizeof(archive_filename), "-");
+ archive_file = stdout;
}
else
-#endif
{
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
+ snprintf(archive_filename, sizeof(archive_filename),
+ "%s/%s", basedir, archive_name);
+ archive_file = NULL;
}
- }
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(archive_filename, ".gz", sizeof(archive_filename));
+ streamer = bbstreamer_gzip_writer_new(archive_filename,
+ archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+
+ /*
+ * If we need to parse the archive for whatever reason, then we'll
+ * also need to re-archive, because, if the output format is tar, the
+ * only point of parsing the archive is to be able to inject stuff
+ * into it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = archive_filename;
+ }
/*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
+ * If we're supposed to inject the backup manifest into the results,
+ * it should be done here, so that the file content can be injected
+ * directly, without worrying about the details of the tar format.
*/
+ if (inject_manifest)
+ manifest_inject_streamer = streamer;
- MemSet(zerobuf, 0, sizeof(zerobuf));
-
- if (state.basetablespace && writerecoveryconf)
+ /*
+ * If this is the main tablespace and we're supposed to write
+ * recovery information, arrange to do that.
+ */
+ if (spclocation == NULL && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ Assert(must_parse_archive);
+ streamer = bbstreamer_recovery_injector_new(streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ /*
+ * If we're doing anything that involves understanding the contents of
+ * the archive, we'll need to parse it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_parser_new(streamer);
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ /* Return the results. */
+ *manifest_inject_streamer_p = manifest_inject_streamer;
+ return streamer;
+}
- writeTarData(&state, header, sizeof(header));
+/*
+ * Receive raw tar data from the server, and stream it to the appropriate
+ * location. If we're writing a single tarfile to standard output, also
+ * receive the backup manifest and inject it into that tarfile.
+ */
+static void
+ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum)
+{
+ WriteTarState state;
+ bbstreamer *manifest_inject_streamer;
+ bool is_recovery_guc_supported;
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ /* Pass all COPY data through to the backup streamer. */
+ memset(&state, 0, sizeof(state));
+ is_recovery_guc_supported =
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ state.streamer = CreateBackupStreamer(archive_name, spclocation,
+ &manifest_inject_streamer,
+ is_recovery_guc_supported);
+ state.tablespacenum = tablespacenum;
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ progress_filename = NULL;
/*
- * Normally, we emit the backup manifest as a separate file, but when
- * we're writing a tarfile to stdout, we don't have that option, so
- * include it in the one tarfile we've got.
+ * The decision as to whether we need to inject the backup manifest into
+ * the output at this stage is made by CreateBackupStreamer; if that is
+ * needed, manifest_inject_streamer will be non-NULL; otherwise, it will
+ * be NULL.
*/
- if (strcmp(basedir, "-") == 0 && manifest)
+ if (manifest_inject_streamer != NULL)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
+ /* Slurp the entire backup manifest into a buffer. */
initPQExpBuffer(&buf);
ReceiveBackupManifestInMemory(conn, &buf);
if (PQExpBufferDataBroken(buf))
@@ -1248,42 +1125,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
- termPQExpBuffer(&buf);
- }
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
+ /* Inject it into the output tarfile. */
+ bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ buf.data, buf.len);
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
+ /* Free memory. */
+ termPQExpBuffer(&buf);
}
- progress_report(rownum, state.filename, true, false);
+ /* Cleanup. */
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+
+ progress_report(tablespacenum, true, false);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1299,184 +1154,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
+ bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
-
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
+ progress_report(state->tablespacenum, false, false);
}
@@ -1501,236 +1182,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true, false);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2023,16 +1474,32 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
+ /* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named base.tar
+ * if it's the main data directory or <tablespaceoid>.tar if it's for
+ * another tablespace. CreateBackupStreamer() will arrange to add .gz
+ * to the archive name if pg_basebackup is performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
+ }
/*
* Now receive backup manifest, if appropriate.
@@ -2048,7 +1515,10 @@ BaseBackup(void)
ReceiveBackupManifest(conn);
if (showprogress)
- progress_report(PQntuples(res), NULL, true, true);
+ {
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true, true);
+ }
PQclear(res);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b1dee9ea2d..037032f85d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3764,3 +3764,13 @@ bbsink
bbsink_ops
bbsink_state
bbsink_throttle
+bbstreamer
+bbstreamer
+bbstreamer_archive_context
+bbstreamer_bzip_writer
+bbstreamer_member
+bbstreamer_ops
+bbstreamer_plain_writer
+bbstreamer_recovery_injector
+bbstreamer_tar_archiver
+bbstreamer_tar_parser
--
2.24.3 (Apple Git-128)
v3-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patchapplication/octet-stream; name=v3-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patchDownload
From 238e3252585201a3b056ef866fc72ccabcc8ef99 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v3 2/7] Refactor basebackup.c's _tarWriteDir() function.
Sometimes, we replace a symbolic link that we find in the data
directory with an actual directory within the tarfile that we
create. _tarWriteDir was responsible both for making this
substitution and also for writing the tar header for the
resulting directory into the tar file. Make it do only the first
of those things, and rename to convert_link_to_directory.
Substantially larger refactoring of this source file is planned,
but this little bit seemed to make sense to commit
independently.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b0b52d3b1a..7d1ddd2f9f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -71,8 +71,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1371,7 +1370,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1387,7 +1388,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1399,7 +1402,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1873,12 +1878,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1887,8 +1891,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.3 (Apple Git-128)
v3-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v3-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload
From a29c73dea16e7e7218ca5354999de1373a829feb Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 11:45:50 -0400
Subject: [PATCH v3 3/7] Introduce 'bbsink' abstraction to modularize base
backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc', but in the future we might introduce
other options.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
---
src/backend/replication/Makefile | 4 +
src/backend/replication/backup_manifest.c | 28 +-
src/backend/replication/basebackup.c | 664 +++++-------------
src/backend/replication/basebackup_copy.c | 331 +++++++++
src/backend/replication/basebackup_progress.c | 254 +++++++
src/backend/replication/basebackup_sink.c | 110 +++
src/backend/replication/basebackup_throttle.c | 202 ++++++
src/include/replication/backup_manifest.h | 5 +-
src/include/replication/basebackup_sink.h | 269 +++++++
src/tools/pgindent/typedefs.list | 4 +
10 files changed, 1357 insertions(+), 514 deletions(-)
create mode 100644 src/backend/replication/basebackup_copy.c
create mode 100644 src/backend/replication/basebackup_progress.c
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/backend/replication/basebackup_throttle.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..74b97cf126 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,10 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_copy.o \
+ basebackup_progress.o \
+ basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 8882444025..fd9798c9e3 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -18,6 +18,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/json.h"
static void AppendStringToManifest(backup_manifest_info *manifest, char *s);
@@ -307,9 +308,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char *checksumstringbuf;
size_t manifest_bytes_done = 0;
@@ -351,38 +351,28 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
- *
- * We choose to read back the data from the temporary file in chunks of
- * size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
- * size, so it seems to make sense to match that value here.
+ * Send the backup manifest.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
- char manifestbuf[BLCKSZ];
size_t bytes_to_read;
size_t rc;
- bytes_to_read = Min(sizeof(manifestbuf),
+ bytes_to_read = Min(sink->bbs_buffer_length,
manifest->manifest_size - manifest_bytes_done);
- rc = BufFileRead(manifest->buffile, manifestbuf, bytes_to_read);
+ rc = BufFileRead(manifest->buffile, sink->bbs_buffer,
+ bytes_to_read);
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 7d1ddd2f9f..1ebab942be 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -59,27 +56,25 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
+static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -90,46 +85,12 @@ static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
-/*
- * Size of each block sent into the tar stream for larger files.
- */
-#define TAR_SEND_SIZE 32768
-
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
-/* The starting XLOG position of the base backup. */
-static XLogRecPtr startptr;
-
/* Total number of checksum failures during base backup. */
static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -255,30 +216,29 @@ static const struct exclude_list_item noChecksumFiles[] = {
static void
perform_base_backup(basebackup_options *opt)
{
- TimeLineID starttli;
+ bbsink_state state;
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- List *tablespaces = NIL;
+ bbsink *sink = bbsink_copytblspc_new();
+ bbsink *progress_sink;
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ /* Initial backup state, insofar as we know it now. */
+ state.tablespaces = NIL;
+ state.tablespace_num = 0;
+ state.bytes_done = 0;
+ state.bytes_total = 0;
+ state.bytes_total_is_valid = false;
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -295,11 +255,11 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
- startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
- tblspc_map_file);
+ basebackup_progress_wait_checkpoint();
+ state.startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint,
+ &state.starttli,
+ labelfile, &state.tablespaces,
+ tblspc_map_file);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -312,7 +272,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -329,7 +288,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = -1;
- tablespaces = lappend(tablespaces, ti);
+ state.tablespaces = lappend(state.tablespaces, ti);
/*
* Calculate the total backup size by summing up the size of each
@@ -337,100 +296,53 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
- NULL);
+ tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
+ true, NULL, NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
+ state.bytes_total += tmp->size;
}
+ state.bytes_total_is_valid = true;
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
-
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, &state);
/* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
+ sendDir(sink, ".", 1, false, state.tablespaces,
+ sendtblspclinks, &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -438,32 +350,33 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ Assert(lnext(state.tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
+ bbsink_end_archive(sink);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -489,8 +402,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -501,7 +413,7 @@ perform_base_backup(basebackup_options *opt)
* shouldn't be such files, but if there are, there's little harm in
* including them.
*/
- XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLByteToSeg(state.startptr, startsegno, wal_segment_size);
XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
@@ -591,7 +503,6 @@ perform_base_backup(basebackup_options *opt)
{
char *walFileName = (char *) lfirst(lc);
int fd;
- char buf[TAR_SEND_SIZE];
size_t cnt;
pgoff_t len = 0;
@@ -630,22 +541,17 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
- while ((cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf),
+ while ((cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length,
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -674,7 +580,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +603,23 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
+ AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
+ endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -739,7 +645,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -951,155 +857,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", LSN_FORMAT_ARGS(ptr));
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
+ int bytes_done = 0,
len;
pg_checksum_context checksum_ctx;
@@ -1125,25 +891,23 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
- update_basebackup_progress(len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
+ elog(ERROR, "could not update checksum of file \"%s\"",
+ filename);
+
+ while (bytes_done < len)
{
- char buf[TAR_BLOCK_SIZE];
+ size_t remaining = len - bytes_done;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
+ memcpy(sink->bbs_buffer, content, nbytes);
+ bbsink_archive_contents(sink, nbytes);
+ bytes_done += nbytes;
}
- if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
- elog(ERROR, "could not update checksum of file \"%s\"",
- filename);
+ _tarWritePadding(sink, len);
AddFileToBackupManifest(manifest, NULL, filename, len,
(pg_time_t) statbuf.st_mtime, &checksum_ctx);
@@ -1157,7 +921,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1187,11 +951,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1210,8 +974,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1371,8 +1135,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1389,8 +1153,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1403,15 +1167,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1442,7 +1206,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1466,7 +1230,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1498,7 +1262,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1506,7 +1270,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1583,21 +1347,19 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
bool block_retry = false;
- char buf[TAR_SEND_SIZE];
uint16 checksum;
int checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
@@ -1618,7 +1380,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1659,9 +1421,11 @@ sendFile(const char *readfilename, const char *tarfilename,
*/
while (len < statbuf->st_size)
{
+ size_t remaining = statbuf->st_size - len;
+
/* Try to read some more data. */
- cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf), statbuf->st_size - len),
+ cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length, remaining),
len, readfilename, true);
/*
@@ -1678,7 +1442,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read a multiple of
* BLCKSZ bytes.
*/
- Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
if (verify_checksum && (cnt % BLCKSZ != 0))
{
@@ -1694,7 +1458,7 @@ sendFile(const char *readfilename, const char *tarfilename,
{
for (i = 0; i < cnt / BLCKSZ; i++)
{
- page = buf + BLCKSZ * i;
+ page = sink->bbs_buffer + BLCKSZ * i;
/*
* Only check pages which have not been modified since the
@@ -1704,7 +1468,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* this case. We also skip completely new pages, since they
* don't have a checksum yet.
*/
- if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+ if (!PageIsNew(page) && PageGetLSN(page) < sink->bbs_state->startptr)
{
checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
phdr = (PageHeader) page;
@@ -1726,7 +1490,8 @@ sendFile(const char *readfilename, const char *tarfilename,
/* Reread the failed block */
reread_cnt =
- basebackup_read_file(fd, buf + BLCKSZ * i,
+ basebackup_read_file(fd,
+ sink->bbs_buffer + BLCKSZ * i,
BLCKSZ, len + BLCKSZ * i,
readfilename,
false);
@@ -1773,34 +1538,29 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
/* Also feed it to the checksum machinery. */
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer, cnt) < 0)
elog(ERROR, "could not update checksum of base backup");
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
- if (len < statbuf->st_size)
+ while (len < statbuf->st_size)
{
- MemSet(buf, 0, sizeof(buf));
- while (len < statbuf->st_size)
- {
- cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
- elog(ERROR, "could not update checksum of base backup");
- update_basebackup_progress(cnt);
- len += cnt;
- throttle(cnt);
- }
+ size_t remaining = statbuf->st_size - len;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
+
+ MemSet(sink->bbs_buffer, 0, nbytes);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ nbytes) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ bbsink_archive_contents(sink, nbytes);
+ len += nbytes;
}
/*
@@ -1808,13 +1568,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
- }
+ _tarWritePadding(sink, len);
CloseTransientFile(fd);
@@ -1837,18 +1591,28 @@ sendFile(const char *readfilename, const char *tarfilename,
return true;
}
-
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[TAR_BLOCK_SIZE];
enum tarError rc;
+ /*
+ * As of this writing, the smallest supported block size is 1kB, which is
+ * twice TAR_BLOCK_SIZE. Since the buffer size is required to be a
+ * multiple of BLCKSZ, it should be safe to assume that the buffer is
+ * large enough to fit an entire tar block. We double-check by means of
+ * these assertions.
+ */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= BLCKSZ,
+ "BLCKSZ too small for tar block");
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
if (!sizeonly)
{
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ rc = tarCreateHeader(sink->bbs_buffer, filename, linktarget,
+ statbuf->st_size, statbuf->st_mode,
+ statbuf->st_uid, statbuf->st_gid,
statbuf->st_mtime);
switch (rc)
@@ -1870,134 +1634,48 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
- update_basebackup_progress(sizeof(h));
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
}
- return sizeof(h);
-}
-
-/*
- * If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
- */
-static void
-convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
-{
- /* If symlink, write it as a directory anyway */
-#ifndef WIN32
- if (S_ISLNK(statbuf->st_mode))
-#else
- if (pgwin32_is_junction(pathbuf))
-#endif
- statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
+ return TAR_BLOCK_SIZE;
}
/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
+ * Pad with zero bytes out to a multiple of TAR_BLOCK_SIZE.
*/
static void
-throttle(size_t increment)
+_tarWritePadding(bbsink *sink, int len)
{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
+ int pad = tarPaddingBytesRequired(len);
/*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
+ * As in _tarWriteHeader, it should be safe to assume that the buffer is
+ * large enough that we don't need to do this in multiple chunks.
*/
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
-
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+ Assert(pad <= TAR_BLOCK_SIZE);
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
+ if (pad > 0)
+ {
+ MemSet(sink->bbs_buffer, 0, pad);
+ bbsink_archive_contents(sink, pad);
}
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
}
/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
static void
-update_basebackup_progress(int64 delta)
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
+ /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+ if (S_ISLNK(statbuf->st_mode))
+#else
+ if (pgwin32_is_junction(pathbuf))
+#endif
+ statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
/*
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
new file mode 100644
index 0000000000..5541334458
--- /dev/null
+++ b/src/backend/replication/basebackup_copy.c
@@ -0,0 +1,331 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_copy.c
+ * send basebackup archives using one COPY OUT operation per
+ * tablespace, and an additional COPY OUT for the backup manifest
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_copy.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+/*
+ * How much data do we want to send in one CopyData message? Note that
+ * this may also result in reading the underlying files in chunks of this
+ * size.
+ *
+ * NB: The buffer size is required to be a multiple of the system block
+ * size, so use that value instead if it's bigger than our preference.
+ */
+#define COPY_BUFFER_LENGTH Max(32768, BLCKSZ)
+
+static void bbsink_copytblspc_begin_backup(bbsink *sink);
+static void bbsink_copytblspc_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copytblspc_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_archive(bbsink *sink);
+static void bbsink_copytblspc_begin_manifest(bbsink *sink);
+static void bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_manifest(bbsink *sink);
+static void bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendTablespaceList(List *tablespaces);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+
+const bbsink_ops bbsink_copytblspc_ops = {
+ .begin_backup = bbsink_copytblspc_begin_backup,
+ .begin_archive = bbsink_copytblspc_begin_archive,
+ .archive_contents = bbsink_copytblspc_archive_contents,
+ .end_archive = bbsink_copytblspc_end_archive,
+ .begin_manifest = bbsink_copytblspc_begin_manifest,
+ .manifest_contents = bbsink_copytblspc_manifest_contents,
+ .end_manifest = bbsink_copytblspc_end_manifest,
+ .end_backup = bbsink_copytblspc_end_backup
+};
+
+/*
+ * Create a new 'copytblspc' bbsink.
+ */
+bbsink *
+bbsink_copytblspc_new(void)
+{
+ bbsink *sink = palloc(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_copytblspc_ops;
+ sink->bbs_next = NULL;
+ sink->bbs_buffer_length = COPY_BUFFER_LENGTH;
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ return sink;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_copytblspc_archive_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", LSN_FORMAT_ARGS(ptr));
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a result set via libpq describing the tablespace list.
+ */
+static void
+SendTablespaceList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..79e47fa359
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,254 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress tracking, including but not
+ * limited to command progress reporting.
+ *
+ * This should be used even if the PROGRESS option to the replication
+ * command BASE_BACKUP is not specified. Without that option, we won't
+ * have tallied up the size of the files that are going to need to be
+ * backed up, but we can still report to the command progress reporting
+ * facility how much data we've processed.
+ *
+ * Moreover, we also use this as a convenient place to update certain
+ * fields of the bbsink_state. That work is accurately described as
+ * keeping track of our progress, but it's not just for introspection.
+ * We need those fields to be updated properly in order for base backups
+ * to work.
+ *
+ * This particular basebackup sink requires extra callbacks that most base
+ * backup sinks don't. Rather than cramming those into the interface, we just
+ * have a few extra functions here that basebackup.c can call. (We could put
+ * the logic directly into that file as it's fairly simple, but it seems
+ * cleaner to have everything related to progress reporting in one place.)
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+static void bbsink_progress_begin_backup(bbsink *sink);
+static void bbsink_progress_archive_contents(bbsink *sink, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress tracking functions and
+ * forwards data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink));
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_progress_ops;
+ sink->bbs_next = next;
+
+ /* Since we're not changing the data, we don't need our own buffer. */
+ sink->bbs_buffer = next->bbs_buffer;
+ sink->bbs_buffer_length = next->bbs_buffer_length;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of the
+ * backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL, -1);
+
+ return sink;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+
+ /*
+ * Report that we are now streaming database files as a base backup. Also
+ * advertise the number of tablespaces, and, if known, the estimated total
+ * backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ if (sink->bbs_state->bytes_total_is_valid)
+ val[1] = sink->bbs_state->bytes_total;
+ else
+ val[1] = -1;
+ val[2] = list_length(sink->bbs_state->tablespaces);
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ bbsink_forward_begin_backup(sink);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ /*
+ * We expect one archive per tablespace, so reaching the end of an archive
+ * also means reaching the end of a tablespace. (Some day we might have a
+ * reason to decouple these concepts.)
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (sink->bbs_state->tablespace_num < list_length(sink->bbs_state->tablespaces))
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ sink->bbs_state->tablespace_num + 1);
+
+ /* Delegate to next sink. */
+ bbsink_forward_end_archive(sink);
+
+ /*
+ * This is a convenient place to update the bbsink_state's notion of which
+ * is the current tablespace. Note that the bbsink_state object is shared
+ * across all bbsink objects involved, but we're the outermost one and
+ * this is the very last thing we do.
+ */
+ sink->bbs_state->tablespace_num++;
+}
+
+/*
+ * Handle progress tracking for new archive contents.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+
+ /* First update bbsink_state with # of bytes done. */
+ state->bytes_done += len;
+
+ /* Now forward to next sink. */
+ bbsink_forward_archive_contents(sink, len);
+
+ /* Prepare to set # of bytes done for command progress reporting. */
+ val[nparam++] = state->bytes_done;
+
+ /*
+ * We may also want to update # of total bytes, to avoid overflowing past
+ * 100% or the full size. This may make the total size number change as we
+ * approach the end of the backup (the estimate will always be wrong if
+ * WAL is included), but that's better than having the done column be
+ * bigger than the total.
+ */
+ if (state->bytes_total_is_valid && state->bytes_done > state->bytes_total)
+ val[nparam++] = state->bytes_done;
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+ Assert(state->tablespace_num >= list_length(state->tablespaces) - 1);
+ Assert(state->tablespace_num <= list_length(state->tablespaces));
+
+ /*
+ * We report having finished all tablespaces at this point, even if the
+ * archive for the main tablespace is still open, because what's going to
+ * be added is WAL files, not files that are really from the main
+ * tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = list_length(state->tablespaces);
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..b51b0cc766
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,110 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+/*
+ * Forward begin_backup callback.
+ */
+void
+bbsink_forward_begin_backup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_state != NULL);
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state);
+}
+
+/*
+ * Forward begin_archive callback.
+ */
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+/*
+ * Forward archive_contents callback.
+ *
+ * Code that wants to use this should initalize its own bbs_buffer and
+ * bbs_buffer_length fields to the values from the successor sink. In cases
+ * where the buffer isn't shared, the data needs to be copied before forwarding
+ * the callback. We don't do try to do that here, because there's really no
+ * reason to have separately allocated buffers containing the same identical
+ * data.
+ */
+void
+bbsink_forward_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_archive_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_archive callback.
+ */
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * Forward begin_manifest callback.
+ */
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward manifest_contents callback.
+ *
+ * As with the archive_contents callback, it's expected that the buffer is
+ * shared.
+ */
+void
+bbsink_forward_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_manifest callback.
+ */
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward end_backup callback.
+ */
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..69c80579cb
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,202 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink);
+static void bbsink_throttle_archive_contents(bbsink *sink, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ /* Since we're not changing the data, we don't need our own buffer. */
+ sink->base.bbs_buffer = next->bbs_buffer;
+ sink->base.bbs_buffer_length = next->bbs_buffer_length;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ bbsink_forward_begin_backup(sink);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 099108910c..16ed7eec9b 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,7 +47,8 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
extern void FreeBackupManifest(backup_manifest_info *manifest);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..fc87071ef3
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,269 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * Taking a base backup produces one archive per tablespace directory,
+ * plus a backup manifest unless that feature has been disabled. The
+ * goal of the backup process is to put those archives and that manifest
+ * someplace, possibly after postprocessing them in some way. A 'bbsink'
+ * is an object to which those archives, and the manifest if present,
+ * can be sent.
+ *
+ * In practice, there will be a chain of 'bbsink' objects rather than
+ * just one, with callbacks being forwarded from one to the next,
+ * possibly with modification. Each object is responsible for a
+ * single task e.g. command progress reporting, throttling, or
+ * communication with the client.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Overall backup state shared by all bbsink objects for a backup.
+ *
+ * Before calling bbstate_begin_backup, caller must initiate a bbsink_state
+ * object which will last for the lifetime of the backup, and must thereafter
+ * update it as required before each new call to a bbsink method. The bbsink
+ * will retain a pointer to the state object and will consult it to understand
+ * the progress of the backup.
+ *
+ * 'tablespaces' is a list of tablespaceinfo objects. It must be set before
+ * calling bbstate_begin_backup() and must not be modified thereafter.
+ *
+ * 'tablespace_num' is the index of the current tablespace within the list
+ * stored in 'tablespaces'.
+ *
+ * 'bytes_done' is the number of bytes read so far from $PGDATA.
+ *
+ * 'bytes_total' is the total number of bytes estimated to be present in
+ * $PGDATA, if we have estimated this.
+ *
+ * 'bytes_total_is_valid' is true if and only if a proper estimate has been
+ * stored into 'bytes_total'.
+ *
+ * 'startptr' and 'starttli' identify the point in the WAL stream at which
+ * the backup began. They must be set before calling bbstate_begin_backup()
+ * and must not be modified thereafter.
+ */
+typedef struct bbsink_state
+{
+ List *tablespaces;
+ int tablespace_num;
+ uint64 bytes_done;
+ uint64 bytes_total;
+ bool bytes_total_is_valid;
+ XLogRecPtr startptr;
+ TimeLineID starttli;
+} bbsink_state;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
+ *
+ * 'bbs_next' is a pointer to another bbsink to which this bbsink is
+ * forwarding some or all operations.
+ *
+ * 'bbs_state' is a pointer to the bbsink_state object for this backup.
+ * Every bbsink associated with this backup should point to the same
+ * underlying state object.
+ *
+ * In general it is expected that the values of these fields are set when
+ * a bbsink is created and that they do not change thereafter. It's OK
+ * to modify the data to which bbs_buffer or bbs_state point, but no changes
+ * should be made to the contents of this struct.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ char *bbs_buffer;
+ int bbs_buffer_length;
+ bbsink *bbs_next;
+ bbsink_state *bbs_state;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline
+ * functions rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /* This callback is invoked just once, at the very start of the backup. */
+ void (*begin_backup) (bbsink *sink);
+
+ /*
+ * For each archive transmitted to a bbsink, there will be one call to the
+ * begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ *
+ * Before invoking the archive_contents() callback, the caller should copy
+ * a number of bytes equal to what will be passed as len into bbs_buffer,
+ * but not more than bbs_buffer_length.
+ *
+ * It's generally good if the buffer is as full as possible before the
+ * archive_contents() callback is invoked, but it's not worth expending
+ * extra cycles to make sure it's absolutely 100% full.
+ */
+ void (*begin_archive) (bbsink *sink, const char *archive_name);
+ void (*archive_contents) (bbsink *sink, size_t len);
+ void (*end_archive) (bbsink *sink);
+
+ /*
+ * If a backup manifest is to be transmitted to a bbsink, there will be
+ * one call to the begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback. These calls will occur after all archives are transmitted.
+ *
+ * The rules for invoking the manifest_contents() callback are the same as
+ * for the archive_contents() callback above.
+ */
+ void (*begin_manifest) (bbsink *sink);
+ void (*manifest_contents) (bbsink *sink, size_t len);
+ void (*end_manifest) (bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup) (bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, bbsink_state *state)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_state = state;
+ sink->bbs_ops->begin_backup(sink);
+
+ Assert(sink->bbs_buffer != NULL);
+ Assert(sink->bbs_buffer_length > 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /*
+ * The caller should make a reasonable attempt to fill the buffer before
+ * calling this function, so it shouldn't be completely empty. Nor should
+ * it be filled beyond capacity.
+ */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->archive_contents(sink, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /* See comments in bbsink_archive_contents. */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->manifest_contents(sink, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ Assert(sink->bbs_state->tablespace_num == list_length(sink->bbs_state->tablespaces));
+
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a0936ead1..b1dee9ea2d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3760,3 +3760,7 @@ yyscan_t
z_stream
z_streamp
zic_t
+bbsink
+bbsink_ops
+bbsink_state
+bbsink_throttle
--
2.24.3 (Apple Git-128)
On 7/8/21 9:26 PM, Robert Haas wrote:
Here at last is a new version.
Please refer this scenario ,where backup target using
--server-compression is closing the server
unexpectedly if we don't provide -no-manifest option
[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4 -t
server:/tmp/data_1 -Xnone
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
if we try to check with -Ft then this same scenario is working ?
[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4 -Ft
-D data_0 -Xnone
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
[tushar@localhost bin]$
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Mon, Jul 12, 2021 at 5:51 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 7/8/21 9:26 PM, Robert Haas wrote:
Here at last is a new version.
Please refer this scenario ,where backup target using
--server-compression is closing the server
unexpectedly if we don't provide -no-manifest option[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4 -t
server:/tmp/data_1 -Xnone
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I think the problem is that bbsink_gzip_end_archive() is not
forwarding the end request to the next bbsink. The attached patch so
fix it.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
fixup_gzip_end_archive.patchtext/x-patch; charset=US-ASCII; name=fixup_gzip_end_archive.patchDownload
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index e9ae50a..fc11e36 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -259,6 +259,8 @@ bbsink_gzip_end_archive(bbsink *sink)
bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
mysink->bytes_written = 0;
}
+
+ bbsink_forward_end_archive(sink);
}
/*
On 7/8/21 9:26 PM, Robert Haas wrote:
Here at last is a new version.
if i try to perform pg_basebackup using "-t server " option against
localhost V/S remote machine ,
i can see difference in backup size.
data directory whose size is
[edb@centos7tushar bin]$ du -sch data/
578M data/
578M total
-h=localhost
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/all_data2*-h
localhost* -Xnone --no-manifest -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
NOTICE: all required WAL segments have been archived
329595/329595 kB (100%), 1/1 tablespace
pg_basebackup: base backup completed
[edb@centos7tushar bin]$ du -sch /tmp/all_data2
322M /tmp/all_data2
322M total
[edb@centos7tushar bin]$
-h=remote
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/all_data2 *-h
<remote IP>* -Xnone --no-manifest -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
NOTICE: all required WAL segments have been archived
170437/170437 kB (100%), 1/1 tablespace
pg_basebackup: base backup completed
[edb@0 bin]$ du -sch /tmp/all_data2
167M /tmp/all_data2
167M total
[edb@0 bin]$
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Fri, Jul 16, 2021 at 12:43 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Jul 12, 2021 at 5:51 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 7/8/21 9:26 PM, Robert Haas wrote:
Here at last is a new version.
Please refer this scenario ,where backup target using
--server-compression is closing the server
unexpectedly if we don't provide -no-manifest option[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4 -t
server:/tmp/data_1 -Xnone
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.I think the problem is that bbsink_gzip_end_archive() is not
forwarding the end request to the next bbsink. The attached patch so
fix it.
I was going through the patch, I think the refactoring made the base
backup code really clean and readable. I have a few minor
suggestions.
v3-0003
1.
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
I have noticed that the interface for forwarding the request to next
bbsink is not uniform, for example bbsink_gzip_begin_archive() is
calling bbsink_begin_archive(sink->bbs_next, gz_archive_name); for
forwarding the request to next bbsink where as
bbsink_progress_begin_backup() is calling
bbsink_forward_begin_backup(sink); I think it will be good if we keep
the usage uniform.
2.
I have noticed that bbbsink_copytblspc_* are not forwarding the
request to the next sink, thats probably because we assume this should
always be the last sink. I agree that its true for this patch but the
commit message of the patch says that in future this might change, so
wouldn't it be good to keep the interface generic? I mean
bbsink_copytblspc_new(), should take the next sink as an input and the
caller can pass it as NULL. And the other apis can also try to
forward the request if next is not NULL?
3.
It would make more sense to order the function in
basebackup_progress.c same as done in other file i.e
bbsink_progress_begin_backup, bbsink_progress_archive_contents and
then bbsink_progress_end_archive, and this will also be in sync with
function pointer declaration in bbsink_ops.
v3-0005-
4.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
typo 'copystream' sends a starts a single COPY OUT --> 'copystream'
sends a single COPY OUT
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On 7/16/21 12:43 PM, Dilip Kumar wrote:
I think the problem is that bbsink_gzip_end_archive() is not
forwarding the end request to the next bbsink. The attached patch so
fix it.
Thanks Dilip. Reported issue seems to be fixed now with your patch
[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4 -t
server:/tmp/data_2 -v -Xnone -R
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
NOTICE: all required WAL segments have been archived
pg_basebackup: base backup completed
[edb@centos7tushar bin]$
OR
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/pv1 -Xnone
--server-compression=gzip4 -r 1024 -P
NOTICE: all required WAL segments have been archived
23133/23133 kB (100%), 1/1 tablespace
[edb@centos7tushar bin]$
Please refer this scenario ,where -R option is working with '-t server'
but not with -Ft
--not working
[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4
-Ft -D ccv -Xnone -R --no-manifest
pg_basebackup: error: unable to parse archive: base.tar.gz
pg_basebackup: only tar archives can be parsed
pg_basebackup: the -R option requires pg_basebackup to parse the archive
pg_basebackup: removing data directory "ccv"
--working
[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4 -t
server:/tmp/ccv -Xnone -R --no-manifest
NOTICE: all required WAL segments have been archived
[edb@centos7tushar bin]$
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Mon, Jul 19, 2021 at 6:02 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 7/16/21 12:43 PM, Dilip Kumar wrote:
I think the problem is that bbsink_gzip_end_archive() is not
forwarding the end request to the next bbsink. The attached patch so
fix it.Thanks Dilip. Reported issue seems to be fixed now with your patch
Thanks for the confirmation.
Please refer this scenario ,where -R option is working with '-t server'
but not with -Ft--not working
[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4
-Ft -D ccv -Xnone -R --no-manifest
pg_basebackup: error: unable to parse archive: base.tar.gz
pg_basebackup: only tar archives can be parsed
pg_basebackup: the -R option requires pg_basebackup to parse the archive
pg_basebackup: removing data directory "ccv"
As per the error message and code, if we are giving -R then we need to
inject recovery-conf file and that is only supported with tar format
but since you are enabling server compression which is no more .tar
format so it is giving an error.
--working
[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4 -t
server:/tmp/ccv -Xnone -R --no-manifest
NOTICE: all required WAL segments have been archived
[edb@centos7tushar bin]$
I am not sure why this is working, from the code I could not find if
the backup target is server then are we doing anything with the -R
option or we are just silently ignoring it
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Jul 8, 2021, at 8:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
The interesting
patches in terms of functionality are 0006 and 0007;
The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence of not sufficiently abstracting out the handling of the tar format. If the bbsink and bbstreamer abstractions fully encapsulated a set of parsing callbacks, then pg_basebackup wouldn't contain things like:
streamer = bbstreamer_tar_parser_new(streamer);
but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever. It just seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by knowing too much about the format. It might even be a useful exercise to write cpio support into this patch set rather than waiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over.
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -z, --gzip compress tar output\n"));
printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
This is the pre-existing --help output, not changed by your patch, but if you anticipate that other output formats will be supported in future releases, perhaps it's better not to write the --help output in such a way as to imply that -z and -Z are somehow connected with the choice of tar format? Would changing the --help now make for less confusion later? I'm just asking...
The new options to pg_basebackup should have test coverage in src/bin/pg_basebackup/t/010_pg_basebackup.pl, though I expect you are waiting to hammer out the interface before writing the tests.
the rest is
preparatory refactoring.
patch v3-0001:
The new function AppendPlainCommandOption writes too many spaces, which does no harm, but seems silly, resulting in lines like:
LOG: received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, WAIT 0, MANIFEST 'yes')
patch v3-0003:
The introduction of the sink abstraction seems incomplete, as basebackup.c still has knowledge of things like tar headers. Calls like _tarWriteHeader(sink, ...) feel like an abstraction violation. I expected perhaps this would get addressed in later patches, but it doesn't.
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
The length must be a multiple of BLCKSZ, not the pointer.
patch-v3-0005:
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
too many verbs.
+ * Regardless of which method is used, we sent a result set with
"is used" vs. "sent" verb tense mismatch.
+ * So we only check it after the number of bytes sine the last check reaches
typo. s/sine/since/
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
src/bin/pg_basebackup/pg_basebackup.c contains word wrap changes like the above which would better be left to a different commit, if done at all.
+ if (state.manifest_file !=NULL)
Need a space after !=
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jul 19, 2021 at 2:51 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence of not sufficiently abstracting out the handling of the tar format. If the bbsink and bbstreamer abstractions fully encapsulated a set of parsing callbacks, then pg_basebackup wouldn't contain things like:
streamer = bbstreamer_tar_parser_new(streamer);
but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever. It just seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by knowing too much about the format. It might even be a useful exercise to write cpio support into this patch set rather than waiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over.
Well, I had a patch in an earlier patch set that tried to get
knowledge of tar out of basebackup.c, but it couldn't use the bbsink
abstraction; it needed a whole separate abstraction layer which I had
called bbarchiver with a different API. So I dropped it, for fear of
being told, not without some justification, that I was just changing
things for the sake of changing them, and also because having exactly
one implementation of some interface is really not great. I do
conceptually like the idea of making the whole thing flexible enough
to generate cpio or zip archives, because like you I think that having
tar-specific stuff all over the place is grotty, but I have a feeling
there's little market demand for having pg_basebackup produce cpio,
pax, zip, iso, etc. archives. On the other hand, server-side
compression and server-side backup seem like functionality with real
utility. Still, if you or others want to vote for resurrecting
bbarchiver on the grounds that general code cleanup is worthwhile for
its own sake, I'm OK with that, too.
I don't really understand what your problem is with how the patch set
leaves pg_basebackup. On the server side, because I dropped the
bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff
about tar. pg_basebackup.c, however, really doesn't know anything much
about tar any more. It knows that if it's getting a tar file and needs
to parse a tar file then it had better call the tar parsing code, but
that seems difficult to avoid. What we can avoid, and I think the
patch set does, is pg_basebackup.c having any real knowledge of what
the tar parser is doing under the hood.
Thanks also for the detailed comments. I'll try to the right number of
verbs in each sentence in the next version of the patch. I will also
look into the issues mentioned by Dilip and Tushar.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Jul 20, 2021, at 11:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I don't really understand what your problem is with how the patch set
leaves pg_basebackup.
I don't have a problem with how the patch set leaves pg_basebackup.
On the server side, because I dropped the
bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff
about tar. pg_basebackup.c, however, really doesn't know anything much
about tar any more. It knows that if it's getting a tar file and needs
to parse a tar file then it had better call the tar parsing code, but
that seems difficult to avoid.
I was only imagining having a callback for injecting manifests or recovery configurations. It is not necessary that this be done in the current patch set, or perhaps ever.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Jul 20, 2021 at 4:03 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I was only imagining having a callback for injecting manifests or recovery configurations. It is not necessary that this be done in the current patch set, or perhaps ever.
A callback where?
I actually think the ideal scenario would be if the server always did
all the work and the client wasn't involved in editing the tarfile,
but it's not super-easy to get there from here. We could add an option
to tell the server whether to inject the manifest into the archive,
which probably wouldn't be too bad. For it to inject the recovery
configuration, we'd have to send that configuration to the server
somehow. I thought about using COPY BOTH mode instead of COPY OUT mode
to allow for stuff like that, but it seems pretty complicated, and I
wasn't really sure that we'd get consensus that it was better even if
I went to the trouble of coding it up.
If we don't do that and stick with the current system where it's
handled on the client side, then I agree that we want to separate the
tar-specific concerns from the injection-type concerns, which the
patch does by making those operations different kinds of bbstreamer
that know only a relatively limited amount about what each other are
doing. You get [server] => [tar parser] => [recovery injector] => [tar
archiver], where the [recovery injector] step nukes the archive file
headers for the files it adds or modifies, and the [tar archiver] step
fixes them up again. So the only thing that the [recovery injector]
piece needs to know is that if it makes any changes to a file, it
should send that file to the next step with a 0-length archive header,
and all the [tar archiver] piece needs to know is that already-valid
headers can be left alone and 0-length ones need to be regenerated.
There may be a better scheme; I don't think this is perfectly elegant.
I do think it's better than what we've got now.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Jul 21, 2021, at 8:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
A callback where?
If you were going to support lots of formats, not just tar, you might want the streamer class for each format to have a callback which sets up the injector, rather than having CreateBackupStreamer do it directly. Even then, having now studied CreateBackupStreamer a bit more, the idea seems less appealing than it did initially. I don't think it makes things any cleaner when only supporting tar, and maybe not even when supporting multiple formats, so I'll withdraw the suggestion.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Jul 21, 2021 at 12:11 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
If you were going to support lots of formats, not just tar, you might want the streamer class for each format to have a callback which sets up the injector, rather than having CreateBackupStreamer do it directly. Even then, having now studied CreateBackupStreamer a bit more, the idea seems less appealing than it did initially. I don't think it makes things any cleaner when only supporting tar, and maybe not even when supporting multiple formats, so I'll withdraw the suggestion.
Gotcha. I think if we had a lot of formats I'd probably make a
separate function where you passed in the file extension and archive
type and it hands you back a parser for the appropriate kind of
archive, or something like that. And then maybe a second, similar
function where you pass in the injector and archive type and it wraps
an archiver of the right type around it and hands that back. But I
don't think that's worth doing until we have 2 or 3 formats, which may
or may not happen any time in the forseeable future.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 7/19/21 8:29 PM, Dilip Kumar wrote:
I am not sure why this is working, from the code I could not find if
the backup target is server then are we doing anything with the -R
option or we are just silently ignoring it
OK, in an another scenario I can see , "-t server" working with
"--server-compression" option but not with -z or -Z ?
"-t server" with option "-z" / or (-Z )
[tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataN -Xnone -z
--no-manifest -p 9033
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.
tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataNa -Z 1
-Xnone --server-compression=gzip4 --no-manifest -p 9033
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.
"-t server" with "server-compression" (working)
[tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataN -Xnone
--server-compression=gzip4 --no-manifest -p 9033
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
[tushar@localhost bin]$
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Thu, Jul 22, 2021 at 1:14 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 7/19/21 8:29 PM, Dilip Kumar wrote:
I am not sure why this is working, from the code I could not find if
the backup target is server then are we doing anything with the -R
option or we are just silently ignoring itOK, in an another scenario I can see , "-t server" working with
"--server-compression" option but not with -z or -Z ?
Right. The error messages or documentation might need some work, but
it's expected that you won't be able to do client-side compression if
the backup is being sent someplace other than to the client.
--
Robert Haas
EDB: http://www.enterprisedb.com
0007 adds server-side compression; currently, it only supports
server-side compression using gzip, but I hope that it won't be hard
to generalize that to support LZ4 as well, and Andres told me he
thinks we should aim to support zstd since that library has built-in
parallel compression which is very appealing in this context.
Thanks, Robert for laying the foundation here.
So, I gave a try to LZ4 streaming API for server-side compression.
LZ4 APIs are documented here[1]https://fossies.org/linux/lz4/doc/lz4frame_manual.html.
With the attached WIP patch, I am now able to take the backup using the lz4
compression. The attached patch is basically applicable on top of Robert's
V3
patch-set[2]/messages/by-id/CA+TgmoYgVN=-Yoh71r3P9N7eKysd7_9b9s+1QFfFcs3w7Z-tig@mail.gmail.com.
I could take the backup using the command:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4
Further, when restored the backup `/tmp/data_lz4` and started the server, I
could see the tables I created, along with the data inserted on the original
server.
When I tried to look into the binary difference between the original data
directory and the backup `data_lz4` directory here is how it looked:
$ diff -qr data/ /tmp/data_lz4
Only in /tmp/data_lz4: backup_label
Only in /tmp/data_lz4: backup_manifest
Only in data/base: pgsql_tmp
Only in /tmp/data_lz4: base.tar
Only in /tmp/data_lz4: base.tar.lz4
Files data/global/pg_control and /tmp/data_lz4/global/pg_control differ
Files data/logfile and /tmp/data_lz4/logfile differ
Only in data/pg_stat: db_0.stat
Only in data/pg_stat: global.stat
Only in data/pg_subtrans: 0000
Only in data/pg_wal: 000000010000000000000099.00000028.backup
Only in data/pg_wal: 00000001000000000000009A
Only in data/pg_wal: 00000001000000000000009B
Only in data/pg_wal: 00000001000000000000009C
Only in data/pg_wal: 00000001000000000000009D
Only in data/pg_wal: 00000001000000000000009E
Only in data/pg_wal/archive_status:
000000010000000000000099.00000028.backup.done
Only in data/: postmaster.opts
For now, what concerns me here is, the following `LZ4F_compressUpdate()`
API,
is the one which is doing the core work of streaming compression:
size_t LZ4F_compressUpdate(LZ4F_cctx* cctx,
void* dstBuffer, size_t dstCapacity,
const void* srcBuffer, size_t srcSize,
const LZ4F_compressOptions_t* cOptPtr);
where, `dstCapacity`, is basically provided by the earlier call to
`LZ4F_compressBound()` which provides minimum `dstCapacity` required to
guarantee success of `LZ4F_compressUpdate()`, given a `srcSize` and
`preferences`, for a worst-case scenario. `LZ4F_compressBound()` is:
size_t LZ4F_compressBound(size_t srcSize, const LZ4F_preferences_t*
prefsPtr);
Now, hard learning here is that the `dstCapacity` returned by the
`LZ4F_compressBound()` even for a single byte i.e. 1 as `srcSize` is about
~256K (seems it is has something to do with the blockSize in lz4 frame that
we
chose, the minimum we can have is 64K), though the actual length of
compressed
data by the `LZ4F_compressUpdate()` is very less. Whereas, the destination
buffer length for us i.e. `mysink->base.bbs_next->bbs_buffer_length` is only
32K. In the function call `LZ4F_compressUpdate()`, if I directly try to
provide
this `mysink->base.bbs_next->bbs_buffer + bytes_written` as `dstBuffer` and
the value returned by the `LZ4F_compressBound()` as the `dstCapacity` that
sounds very much incorrect to me, since the actual out buffer length
remaining
is much less than what is calculated for the worst case by
`LZ4F_compressBound()`.
For now, I am creating a temporary buffer of the required size, passing it
for compression, assert that the actual compressed bytes are less than the
whatever length we have available and then copy it to our output buffer.
To give an example, I put some logging statements, and I can see in the log:
"
bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
input size to be compressed: 512
estimated size for compressed buffer by LZ4F_compressBound(): 262667
actual compressed size: 16
"
Will really appreciate any inputs, comments, suggestions here.
Regards,
Jeevan Ladhe
[1]: https://fossies.org/linux/lz4/doc/lz4frame_manual.html
[2]: /messages/by-id/CA+TgmoYgVN=-Yoh71r3P9N7eKysd7_9b9s+1QFfFcs3w7Z-tig@mail.gmail.com
/messages/by-id/CA+TgmoYgVN=-Yoh71r3P9N7eKysd7_9b9s+1QFfFcs3w7Z-tig@mail.gmail.com
Attachments:
lz4_compress_wip.patchapplication/octet-stream; name=lz4_compress_wip.patchDownload
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ec50fbab12..8ba1095e4b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -54,7 +54,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -293,6 +294,8 @@ perform_base_backup(basebackup_options *opt)
/* Set up server-side compression, if client requested it */
if (opt->compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt->compression_level);
+ if (opt->compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -926,6 +929,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..593fda7e47
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,306 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include <unistd.h>
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to
+ * the lz4 compression. Defined as 8k, since the input buffer is multiple of
+ * BLCKSZ i.e. multiple of 8k.
+ */
+#define CHUNK_SIZE 8192
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+
+ /* Buffer for keeping the compressed bytes. */
+ void *compressed_buffer;
+ /* Length of the compressed buffer. */
+ size_t compressed_buffer_length;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ /*
+ * We need our own buffer, because we're going to pass different data
+ * to the next sink than what gets passed to us.
+ *
+ * We could try making the input buffer bigger than the output buffer,
+ * because we expect that compression is going to shrink the input data.
+ * However, as the compression ratio could be quite high (>10x) and to take
+ * full advantage of this we would need a huge input buffer. Instead
+ * it seems better to assume the input buffer may be filled multiple times
+ * before we succeed in filling the output buffer, and keep the input
+ * buffer relatively small. For now we just make it the same size as the
+ * output buffer.
+ */
+ sink->base.bbs_buffer_length = next->bbs_buffer_length;
+ sink->base.bbs_buffer = palloc(sink->base.bbs_buffer_length);
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+ size_t headerSize;
+
+ /* Initialize compressor object. */
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->frameInfo.blockMode = LZ4F_blockLinked;
+ prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum;
+ prefs->frameInfo.frameType = LZ4F_frame;
+ prefs->frameInfo.contentSize = 0;
+ prefs->frameInfo.dictID = 0;
+ prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum;
+ prefs->compressionLevel = 0;
+
+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;
+
+ prefs->favorDecSpeed = 0;
+ prefs->reserved[0] = 0;
+ prefs->reserved[1] = 0;
+ prefs->reserved[2] = 0;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ CHUNK_SIZE,
+ prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /*
+ * We will be reading the input buffer in the CHUNK_SIZE'd chunks. The
+ * LZ4F_compressUpdate() would need to have sufficient buffer to write the
+ * compressed buffer of the size even in the worst case. Allocate the
+ * buffer once and keep it around until we are done with archiving.
+ */
+ mysink->compressed_buffer_length = LZ4F_compressBound(CHUNK_SIZE,
+ &mysink->prefs);
+ mysink->compressed_buffer = palloc(mysink->compressed_buffer_length);
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ uint8 *next_in = (uint8 *) mysink->base.bbs_buffer;
+
+ while (avail_in > 0)
+ {
+ size_t compressedSize;
+ int nextChunkLen = CHUNK_SIZE;
+
+ /* Last chunk to be read from the input. */
+ if (avail_in < CHUNK_SIZE)
+ nextChunkLen = avail_in;
+
+ /*
+ * If we do not have enough space left in the output buffer for this
+ * chunk to be written, first archive the already written contents.
+ */
+ if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written ||
+ mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Read the nextChunkLen size of data from the input buffer and write the
+ * output data into unused portion of output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->compressed_buffer,
+ mysink->compressed_buffer_length,
+ next_in, nextChunkLen,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * We should have enough space left in the out buffer to write this
+ * compressed buffer.
+ */
+ Assert(compressedSize <=
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written);
+
+ memcpy((uint8 *) mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->compressed_buffer, compressedSize);
+
+ /*
+ * Update our notion of how many bytes we've written into output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Advance the input start since we already read some data. */
+ next_in = (uint8 *) next_in + nextChunkLen;
+ avail_in = avail_in - nextChunkLen;
+ }
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ pfree(mysink->compressed_buffer);
+ mysink->compressed_buffer = NULL;
+ mysink->compressed_buffer_length = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 9236642d93..0f1c9d19c6 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -258,6 +258,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
On Wed, Sep 8, 2021 at 2:14 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
To give an example, I put some logging statements, and I can see in the log:
"
bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
input size to be compressed: 512
estimated size for compressed buffer by LZ4F_compressBound(): 262667
actual compressed size: 16
"
That is pretty lame. I don't know why it needs a ~256k buffer to
produce 16 bytes of output.
The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer. So if the LZ4 buffer is told that its buffer should
be at least, I don't know, say 64kB. Then it can compute how large an
output buffer the LZ4 library requires for 64kB. Hopefully we can
assume that liblz4 never needs a smaller buffer for a larger input.
Then we can assume that if a 64kB input requires, say, a 300kB output
buffer, every possible input < 64kB also requires an output buffer <=
300 kB.
But we can't just say, well, we were asked to create a 64kB buffer (or
whatever) so let's ask the next bbsink for a 300kB buffer (or
whatever), because then as soon as we write any data at all into it
the remaining buffer space might be insufficient for the next chunk.
So instead what I think we should do is have bbsink_lz4 set the size
of the next sink's buffer to its own buffer size +
LZ4F_compressBound(its own buffer size). So in this example if it's
asked to create a 64kB buffer and LZ4F_compressBound(64kB) = 300kB
then it asks the next sink to set the buffer size to 364kB. Now, that
means that there will always be at least 300 kB available in the
output buffer until we've accumulated a minimum of 64 kB of compressed
data, and then at that point we can flush.
I think this would be relatively clean and would avoid the need for
the double copying that the current design forced you to do. What do
you think?
+ /*
+ * If we do not have enough space left in the output buffer for this
+ * chunk to be written, first archive the already written contents.
+ */
+ if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length -
mysink->bytes_written ||
+ mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
I think this is flat-out wrong. It assumes that the compressor will
never generate more than N bytes of output given N bytes of input,
which is not true. Not sure there's much point in fixing it now
because with the changes described above this code will have to change
anyway, but I think it's just lucky that this has worked for you in
your testing.
+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;
I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?
Thanks for researching this!
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Sep 8, 2021 at 3:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer.
Here's a new patch set with that design change (and a bug fix for 0001).
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v4-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v4-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From 67e45f26a2035f714afdfa3c40e09a250448a228 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 9 Sep 2021 14:53:04 -0400
Subject: [PATCH v4 5/7] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to suppor the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 62 ++-
src/backend/replication/basebackup_copy.c | 266 ++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
src/tools/pgindent/typedefs.list | 3 +
5 files changed, 722 insertions(+), 53 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ecd32e8436..aefa7cb17e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -81,6 +88,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -233,7 +241,7 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- bbsink *sink = bbsink_copytblspc_new();
+ bbsink *sink;
bbsink *progress_sink;
/* Initial backup state, insofar as we know it now. */
@@ -243,6 +251,16 @@ perform_base_backup(basebackup_options *opt)
state.bytes_total = 0;
state.bytes_total_is_valid = false;
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -383,7 +401,10 @@ perform_base_backup(basebackup_options *opt)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(progress_sink);
@@ -621,6 +642,7 @@ perform_base_backup(basebackup_options *opt)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -688,8 +710,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -820,6 +844,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -1672,6 +1712,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 564f010188..389a520417 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,51 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -37,6 +101,17 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -48,6 +123,193 @@ const bbsink_ops bbsink_copytblspc_ops = {
.end_backup = bbsink_copytblspc_end_backup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins
+ * with a the type byte we're going to need, and then arrange things so
+ * that the data we're given will be written just after that type byte.
+ * That will allow us to ship the data with a single call to pq_putmessage
+ * and without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 947a182e86..8221a8c9ac 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -167,6 +177,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -983,10 +1000,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1013,8 +1031,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1054,16 +1072,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1074,8 +1092,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1085,6 +1103,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1332,28 +1661,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1476,46 +1809,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 3a2206d82f..2047d0fa7a 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,6 +261,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b916f09165..54c67982f5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3765,7 +3765,10 @@ yyscan_t
z_stream
z_streamp
zic_t
+ArchiveStreamState
+backup_target_type
bbsink
+bbsink_copystream
bbsink_ops
bbsink_state
bbsink_throttle
--
2.24.3 (Apple Git-128)
v4-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchapplication/octet-stream; name=v4-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchDownload
From 88166b3ae168b3860288dc6304ff418b913ab8e7 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 11:33:32 -0400
Subject: [PATCH v4 1/7] Flexible options for BASE_BACKUP and
CREATE_REPLICATION_SLOT.
Previously, these replication commands used an entirely hard-coded
syntax, but that's hard to extend. Instead, adopt the same kind of
syntax we've used for SQL commands such as VACUUM, ANALYZE, COPY,
and EXPLAIN, where it's not necessary for all of the option names
to be parser keywords.
This commit does not remove support for the old syntax. It just
adds the new one as an additional option, and makes pg_basebackup
prefer the new syntax when the server is new enough to support it.
v2: Fix compile error.
v3: Fix inverted test, as reported by Tushar Ahuja.
v4: Adjustments for v15.
v5: Adjustments for TWO_PHASE option.
---
src/backend/replication/basebackup.c | 33 ++---
.../libpqwalreceiver/libpqwalreceiver.c | 16 +--
src/backend/replication/repl_gram.y | 116 +++++++++++++++---
src/backend/replication/walsender.c | 17 +--
src/bin/pg_basebackup/pg_basebackup.c | 65 ++++++----
src/bin/pg_basebackup/streamutil.c | 102 +++++++++++++--
src/bin/pg_basebackup/streamutil.h | 12 ++
7 files changed, 278 insertions(+), 83 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e09108d0ec..b0b52d3b1a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "catalog/pg_type.h"
#include "common/file_perm.h"
+#include "commands/defrem.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "libpq/libpq.h"
@@ -787,7 +788,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->label = strVal(defel->arg);
+ opt->label = defGetString(defel);
o_label = true;
}
else if (strcmp(defel->defname, "progress") == 0)
@@ -796,7 +797,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->progress = true;
+ opt->progress = defGetBoolean(defel);
o_progress = true;
}
else if (strcmp(defel->defname, "fast") == 0)
@@ -805,16 +806,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->fastcheckpoint = true;
+ opt->fastcheckpoint = defGetBoolean(defel);
o_fast = true;
}
- else if (strcmp(defel->defname, "nowait") == 0)
+ else if (strcmp(defel->defname, "wait") == 0)
{
if (o_nowait)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->nowait = true;
+ opt->nowait = !defGetBoolean(defel);
o_nowait = true;
}
else if (strcmp(defel->defname, "wal") == 0)
@@ -823,19 +824,19 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->includewal = true;
+ opt->includewal = defGetBoolean(defel);
o_wal = true;
}
else if (strcmp(defel->defname, "max_rate") == 0)
{
- long maxrate;
+ int64 maxrate;
if (o_maxrate)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- maxrate = intVal(defel->arg);
+ maxrate = defGetInt64(defel);
if (maxrate < MAX_RATE_LOWER || maxrate > MAX_RATE_UPPER)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
@@ -851,21 +852,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->sendtblspcmapfile = true;
+ opt->sendtblspcmapfile = defGetBoolean(defel);
o_tablespace_map = true;
}
- else if (strcmp(defel->defname, "noverify_checksums") == 0)
+ else if (strcmp(defel->defname, "verify_checksums") == 0)
{
if (o_noverify_checksums)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- noverify_checksums = true;
+ noverify_checksums = !defGetBoolean(defel);
o_noverify_checksums = true;
}
else if (strcmp(defel->defname, "manifest") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
bool manifest_bool;
if (o_manifest)
@@ -890,7 +891,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "manifest_checksums") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
if (o_manifest_checksums)
ereport(ERROR,
@@ -905,8 +906,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
o_manifest_checksums = true;
}
else
- elog(ERROR, "option \"%s\" not recognized",
- defel->defname);
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname));
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 19ea159af4..dd80929daa 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -872,26 +872,28 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
if (conn->logical)
{
- appendStringInfoString(&cmd, " LOGICAL pgoutput");
- if (two_phase)
- appendStringInfoString(&cmd, " TWO_PHASE");
+ appendStringInfoString(&cmd, " LOGICAL pgoutput (");
switch (snapshot_action)
{
case CRS_EXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " EXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, "EXPORT_SNAPSHOT TRUE");
break;
case CRS_NOEXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " NOEXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, "EXPORT_SNAPSHOT FALSE");
break;
case CRS_USE_SNAPSHOT:
- appendStringInfoString(&cmd, " USE_SNAPSHOT");
+ appendStringInfoString(&cmd, "USE_SNAPSHOT");
break;
}
+
+ if (two_phase)
+ appendStringInfoString(&cmd, ", TWO_PHASE");
+ appendStringInfoChar(&cmd, ')');
}
else
{
- appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+ appendStringInfoString(&cmd, " PHYSICAL (RESERVE_WAL)");
}
res = libpqrcv_PQexec(conn->streamConn, cmd.data);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..69e990cda3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -95,16 +95,16 @@ static SQLCmd *make_sqlcmd(void);
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_legacy_opt_list generic_option_list
+%type <defelt> base_backup_legacy_opt generic_option
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
%type <node> plugin_opt_arg
-%type <str> opt_slot var_name
+%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
-%type <list> create_slot_opt_list
-%type <defelt> create_slot_opt
+%type <list> create_slot_options create_slot_legacy_opt_list
+%type <defelt> create_slot_legacy_opt
%%
@@ -157,12 +157,24 @@ var_name: IDENT { $$ = $1; }
;
/*
+ * BASE_BACKUP ( option [ 'value' ] [, ...] )
+ *
+ * We also still support the legacy syntax:
+ *
* BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
* [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
* [MANIFEST %s] [MANIFEST_CHECKSUMS %s]
+ *
+ * Future options should be supported only using the new syntax.
*/
base_backup:
- K_BASE_BACKUP base_backup_opt_list
+ K_BASE_BACKUP '(' generic_option_list ')'
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ $$ = (Node *) cmd;
+ }
+ | K_BASE_BACKUP base_backup_legacy_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
@@ -170,14 +182,14 @@ base_backup:
}
;
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
+base_backup_legacy_opt_list:
+ base_backup_legacy_opt_list base_backup_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-base_backup_opt:
+base_backup_legacy_opt:
K_LABEL SCONST
{
$$ = makeDefElem("label",
@@ -200,8 +212,8 @@ base_backup_opt:
}
| K_NOWAIT
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("wait",
+ (Node *)makeInteger(false), -1);
}
| K_MAX_RATE UCONST
{
@@ -215,8 +227,8 @@ base_backup_opt:
}
| K_NOVERIFY_CHECKSUMS
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("verify_checksums",
+ (Node *)makeInteger(false), -1);
}
| K_MANIFEST SCONST
{
@@ -231,8 +243,8 @@ base_backup_opt:
;
create_replication_slot:
- /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
- K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL [options] */
+ K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -242,8 +254,8 @@ create_replication_slot:
cmd->options = $5;
$$ = (Node *) cmd;
}
- /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
- | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin [options] */
+ | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -256,14 +268,19 @@ create_replication_slot:
}
;
-create_slot_opt_list:
- create_slot_opt_list create_slot_opt
+create_slot_options:
+ '(' generic_option_list ')' { $$ = $2; }
+ | create_slot_legacy_opt_list { $$ = $1; }
+ ;
+
+create_slot_legacy_opt_list:
+ create_slot_legacy_opt_list create_slot_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-create_slot_opt:
+create_slot_legacy_opt:
K_EXPORT_SNAPSHOT
{
$$ = makeDefElem("export_snapshot",
@@ -422,6 +439,65 @@ plugin_opt_arg:
sql_cmd:
IDENT { $$ = (Node *) make_sqlcmd(); }
;
+
+generic_option_list:
+ generic_option_list ',' generic_option
+ { $$ = lappend($1, $3); }
+ | generic_option
+ { $$ = list_make1($1); }
+ ;
+
+generic_option:
+ ident_or_keyword
+ {
+ $$ = makeDefElem($1, NULL, -1);
+ }
+ | ident_or_keyword IDENT
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword SCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword UCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeInteger($2), -1);
+ }
+ ;
+
+ident_or_keyword:
+ IDENT { $$ = $1; }
+ | K_BASE_BACKUP { $$ = "base_backup"; }
+ | K_IDENTIFY_SYSTEM { $$ = "identify_system"; }
+ | K_SHOW { $$ = "show"; }
+ | K_START_REPLICATION { $$ = "start_replication"; }
+ | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
+ | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
+ | K_TIMELINE_HISTORY { $$ = "timeline_history"; }
+ | K_LABEL { $$ = "label"; }
+ | K_PROGRESS { $$ = "progress"; }
+ | K_FAST { $$ = "fast"; }
+ | K_WAIT { $$ = "wait"; }
+ | K_NOWAIT { $$ = "nowait"; }
+ | K_MAX_RATE { $$ = "max_rate"; }
+ | K_WAL { $$ = "wal"; }
+ | K_TABLESPACE_MAP { $$ = "tablespace_map"; }
+ | K_NOVERIFY_CHECKSUMS { $$ = "noverify_checksums"; }
+ | K_TIMELINE { $$ = "timeline"; }
+ | K_PHYSICAL { $$ = "physical"; }
+ | K_LOGICAL { $$ = "logical"; }
+ | K_SLOT { $$ = "slot"; }
+ | K_RESERVE_WAL { $$ = "reserve_wal"; }
+ | K_TEMPORARY { $$ = "temporary"; }
+ | K_TWO_PHASE { $$ = "two_phase"; }
+ | K_EXPORT_SNAPSHOT { $$ = "export_snapshot"; }
+ | K_NOEXPORT_SNAPSHOT { $$ = "noexport_snapshot"; }
+ | K_USE_SNAPSHOT { $$ = "use_snapshot"; }
+ | K_MANIFEST { $$ = "manifest"; }
+ | K_MANIFEST_CHECKSUMS { $$ = "manifest_checksums"; }
+ ;
+
%%
static SQLCmd *
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..cf7b87e48c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -891,7 +891,8 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
snapshot_action_given = true;
- *snapshot_action = CRS_USE_SNAPSHOT;
+ if (defGetBoolean(defel))
+ *snapshot_action = CRS_USE_SNAPSHOT;
}
else if (strcmp(defel->defname, "reserve_wal") == 0)
{
@@ -901,7 +902,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
reserve_wal_given = true;
- *reserve_wal = true;
+ *reserve_wal = defGetBoolean(defel);
}
else if (strcmp(defel->defname, "two_phase") == 0)
{
@@ -910,7 +911,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
two_phase_given = true;
- *two_phase = true;
+ *two_phase = defGetBoolean(defel);
}
else
elog(ERROR, "unrecognized option: %s", defel->defname);
@@ -980,7 +981,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... EXPORT_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (EXPORT_SNAPSHOT)")));
need_full_snapshot = true;
}
@@ -990,25 +991,25 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (XactIsoLevel != XACT_REPEATABLE_READ)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called in REPEATABLE READ isolation mode transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (FirstSnapshotSet)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called before any query",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
if (IsSubTransaction())
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called in a subtransaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (USE_SNAPSHOT)")));
need_full_snapshot = true;
}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7296eb97d0..0c8be22558 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1809,10 +1809,6 @@ BaseBackup(void)
TimeLineID latesttli;
TimeLineID starttli;
char *basebkp;
- char escaped_label[MAXPGPATH];
- char *maxrate_clause = NULL;
- char *manifest_clause = NULL;
- char *manifest_checksums_clause = "";
int i;
char xlogstart[64];
char xlogend[64];
@@ -1821,8 +1817,11 @@ BaseBackup(void)
int serverVersion,
serverMajor;
int writing_to_stdout;
+ bool use_new_option_syntax = false;
+ PQExpBufferData buf;
Assert(conn != NULL);
+ initPQExpBuffer(&buf);
/*
* Check server version. BASE_BACKUP command was introduced in 9.1, so we
@@ -1840,6 +1839,8 @@ BaseBackup(void)
serverver ? serverver : "'unknown'");
exit(1);
}
+ if (serverMajor >= 1500)
+ use_new_option_syntax = true;
/*
* If WAL streaming was requested, also check that the server is new
@@ -1870,20 +1871,42 @@ BaseBackup(void)
/*
* Start the actual backup
*/
- PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);
-
+ AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
+ if (estimatesize)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");
+ if (includewal == FETCH_WAL)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");
+ if (fastcheckpoint)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "FAST");
+ if (includewal != NO_WAL)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "WAIT", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "NOWAIT");
+ }
if (maxrate > 0)
- maxrate_clause = psprintf("MAX_RATE %u", maxrate);
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
+ maxrate);
+ if (format == 't')
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+ if (!verify_checksums)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax,
+ "VERIFY_CHECKSUMS", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax,
+ "NOVERIFY_CHECKSUMS");
+ }
if (manifest)
{
- if (manifest_force_encode)
- manifest_clause = "MANIFEST 'force-encode'";
- else
- manifest_clause = "MANIFEST 'yes'";
+ AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
- manifest_checksums_clause = psprintf("MANIFEST_CHECKSUMS '%s'",
- manifest_checksums);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
if (verbose)
@@ -1898,18 +1921,10 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s %s",
- escaped_label,
- estimatesize ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
- manifest_clause ? manifest_clause : "",
- manifest_checksums_clause);
+ if (use_new_option_syntax && buf.len > 0)
+ basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
+ else
+ basebkp = psprintf("BASE_BACKUP %s", buf.data);
if (PQsendQuery(conn, basebkp) == 0)
{
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..9232ef77e2 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -490,6 +490,7 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
{
PQExpBuffer query;
PGresult *res;
+ bool use_new_option_syntax = (PQserverVersion(conn) >= 150000);
query = createPQExpBuffer();
@@ -498,27 +499,51 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
Assert(!(two_phase && is_physical));
Assert(slot_name != NULL);
- /* Build query */
+ /* Build base portion of query */
appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
if (is_temporary)
appendPQExpBufferStr(query, " TEMPORARY");
if (is_physical)
- {
appendPQExpBufferStr(query, " PHYSICAL");
+ else
+ appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
+
+ /* Add any requested options */
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(query, " (");
+ if (is_physical)
+ {
if (reserve_wal)
- appendPQExpBufferStr(query, " RESERVE_WAL");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "RESERVE_WAL");
}
else
{
- appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
if (two_phase && PQserverVersion(conn) >= 150000)
- appendPQExpBufferStr(query, " TWO_PHASE");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "TWO_PHASE");
- if (PQserverVersion(conn) >= 100000)
- /* pg_recvlogical doesn't use an exported snapshot, so suppress */
- appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ /* pg_recvlogical doesn't use an exported snapshot, so suppress */
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(query, use_new_option_syntax,
+ "EXPORT_SNAPSHOT", 0);
+ else
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "NOEXPORT_SNAPSHOT");
+ }
+ if (use_new_option_syntax)
+ {
+ /* Suppress option list if it would be empty, otherwise terminate */
+ if (query->data[query->len - 1] == '(')
+ {
+ query->len -= 2;
+ query->data[query->len] = '\0';
+ }
+ else
+ appendPQExpBufferChar(query, ')');
}
+ /* Now run the query */
res = PQexec(conn, query->data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
@@ -603,6 +628,67 @@ DropReplicationSlot(PGconn *conn, const char *slot_name)
return true;
}
+/*
+ * Append a "plain" option - one with no value - to a server command that
+ * is being constructed.
+ *
+ * In the old syntax, all options were parser keywords, so you could just
+ * write things like SOME_COMMAND OPTION1 OPTION2 'opt2value' OPTION3 42. The
+ * new syntax uses a comma-separated list surrounded by parentheses, so the
+ * equivalent is SOME_COMMAND (OPTION1, OPTION2 'optvalue', OPTION3 42).
+ */
+void
+AppendPlainCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name)
+{
+ if (buf->len > 0 && buf->data[buf->len - 1] != '(')
+ {
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(buf, ", ");
+ else
+ appendPQExpBufferChar(buf, ' ');
+ }
+
+ appendPQExpBuffer(buf, " %s", option_name);
+}
+
+/*
+ * Append an option with an associated string value to a server command that
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendStringCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, char *option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ if (option_value != NULL)
+ {
+ size_t length = strlen(option_value);
+ char *escaped_value = palloc(1 + 2 * length);
+
+ PQescapeStringConn(conn, escaped_value, option_value, length, NULL);
+ appendPQExpBuffer(buf, " '%s'", escaped_value);
+ pfree(escaped_value);
+ }
+}
+
+/*
+ * Append an option with an associated integer value to a server command
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendIntegerCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, int32 option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ appendPQExpBuffer(buf, " %d", option_value);
+}
/*
* Frontend version of GetCurrentTimestamp(), since we are not linked with
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..65135c79e0 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -15,6 +15,7 @@
#include "access/xlogdefs.h"
#include "datatype/timestamp.h"
#include "libpq-fe.h"
+#include "pqexpbuffer.h"
extern const char *progname;
extern char *connection_string;
@@ -40,6 +41,17 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
TimeLineID *starttli,
XLogRecPtr *startpos,
char **db_name);
+
+extern void AppendPlainCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_value);
+extern void AppendStringCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, char *option_value);
+extern void AppendIntegerCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, int32 option_value);
+
extern bool RetrieveWalSegSize(PGconn *conn);
extern TimestampTz feGetCurrentTimestamp(void);
extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
--
2.24.3 (Apple Git-128)
v4-0007-WIP-Server-side-gzip-compression.patchapplication/octet-stream; name=v4-0007-WIP-Server-side-gzip-compression.patchDownload
From 6bbec56492aa40730cf931debcc38188c8cc237a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 9 Sep 2021 19:41:47 -0400
Subject: [PATCH v4 7/7] WIP: Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
---
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 300 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 38 ++-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 379 insertions(+), 2 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 62f915e8b8..d6df3fdeb2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -292,6 +300,10 @@ perform_base_backup(basebackup_options *opt)
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt->compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt->compression_level);
+
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -740,11 +752,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str;
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -904,6 +918,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..d0d2c44cd9
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,300 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index c23cb2846f..38919fa6d9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -992,7 +993,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1001,14 +1004,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1731,6 +1752,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2141,6 +2173,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2320,6 +2353,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index c074da9313..f09aecb53b 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -263,6 +263,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v4-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v4-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload
From 29ab6e07ed8fbbd21416a2cf9f907acb9e07532d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 12:00:34 -0400
Subject: [PATCH v4 4/7] Introduce 'bbstreamer' abstraction to modularize
pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 +++++
src/bin/pg_basebackup/bbstreamer_file.c | 579 ++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 912 +++++-----------------
src/tools/pgindent/typedefs.list | 10 +
7 files changed, 1697 insertions(+), 727 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..8fda09dcd4 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -34,10 +34,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -60,7 +66,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..b24dc848c1
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content) (bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize) (bbstreamer *streamer);
+ void (*free) (bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..03e1ea2550
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,579 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include <unistd.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map) (const char *);
+ void (*report_output_file) (const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any
+ * symbolic link, and which should return a replacement pathname to be used
+ * in its place. If NULL, the symbolic link target is used without
+ * modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a
+ * new output file. The pathname to that file is passed as an argument. If
+ * NULL, the call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = pstrdup(basepath);
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6 clusters) will
+ * have been created by the wal receiver process. Also, when the WAL
+ * directory location was specified, pg_wal (or pg_xlog) has already
+ * been created as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ pfree(mystreamer->basepath);
+ pfree(mystreamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..4d15251fdc
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf; on
+ * older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..5a9f587dca
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,444 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+
+ /*
+ * If we're expecting an archive member header, accumulate a
+ * full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the file
+ * trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not the
+ * start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0c8be22558..947a182e86 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -28,18 +28,13 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
-#include "common/string.h"
#include "fe_utils/option_utils.h"
#include "fe_utils/recovery_gen.h"
-#include "fe_utils/string_utils.h"
#include "getopt_long.h"
-#include "libpq-fe.h"
-#include "pgtar.h"
-#include "pgtime.h"
-#include "pqexpbuffer.h"
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
@@ -62,34 +57,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -161,10 +131,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -190,14 +161,15 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force,
- bool finished);
-
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force, bool finished);
+
+static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported);
+static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -360,21 +332,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -763,6 +720,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -775,8 +740,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* is moved to the next line.
*/
static void
-progress_report(int tablespacenum, const char *filename,
- bool force, bool finished)
+progress_report(int tablespacenum, bool force, bool finished)
{
int percent;
char totaldone_str[32];
@@ -816,7 +780,7 @@ progress_report(int tablespacenum, const char *filename,
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -832,7 +796,7 @@ progress_report(int tablespacenum, const char *filename,
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -846,7 +810,7 @@ progress_report(int tablespacenum, const char *filename,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -992,257 +956,170 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
}
/*
- * Write a piece of tar data
+ * Figure out what to do with an archive received from the server based on
+ * the options selected by the user. We may just write the results directly
+ * to a file, or we might compress first, or we might extract the tar file
+ * and write each member separately. This function doesn't do any of that
+ * directly, but it works out what kind of bbstreamer we need to create so
+ * that the right stuff happens when, down the road, we actually receive
+ * the data.
*/
-static void
-writeTarData(WriteTarState *state, char *buf, int r)
+static bbstreamer *
+CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported)
{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-}
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer = NULL;
+ bool inject_manifest;
+ bool must_parse_archive;
-/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
- *
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
- */
-static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- char zerobuf[TAR_BLOCK_SIZE * 2];
- WriteTarState state;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
+ /*
+ * Normally, we emit the backup manifest as a separate file, but when
+ * we're writing a tarfile to stdout, we don't have that option, so
+ * include it in the one tarfile we've got.
+ */
+ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ /*
+ * We have to parse the archive if (1) we're suppose to extract it, or if
+ * (2) we need to inject backup_manifest or recovery configuration into it.
+ */
+ must_parse_archive = (format == 'p' || inject_manifest ||
+ (spclocation == NULL && writerecoveryconf));
- if (state.basetablespace)
+ if (format == 'p')
{
+ const char *directory;
+
/*
- * Base tablespaces
+ * In plain format, we must extract the archive. The data for the main
+ * tablespace will be written to the base directory, and the data for
+ * other tablespaces will be written to the directory where they're
+ * located on the server, after applying any user-specified tablespace
+ * mappings.
*/
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
-
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
- else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ directory = spclocation == NULL ? basedir
+ : get_tablespace_mapping(spclocation);
+ streamer = bbstreamer_extractor_new(directory,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
+ FILE *archive_file;
+ char archive_filename[MAXPGPATH];
+
/*
- * Specific tablespace
+ * In tar format, we just write the archive without extracting it.
+ * Normally, we write it to the archive name provided by the caller,
+ * but when the base directory is "-" that means we need to write
+ * to standard output.
*/
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(archive_filename, sizeof(archive_filename), "-");
+ archive_file = stdout;
}
else
-#endif
{
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
+ snprintf(archive_filename, sizeof(archive_filename),
+ "%s/%s", basedir, archive_name);
+ archive_file = NULL;
}
- }
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(archive_filename, ".gz", sizeof(archive_filename));
+ streamer = bbstreamer_gzip_writer_new(archive_filename,
+ archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+
+ /*
+ * If we need to parse the archive for whatever reason, then we'll
+ * also need to re-archive, because, if the output format is tar, the
+ * only point of parsing the archive is to be able to inject stuff
+ * into it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = archive_filename;
+ }
/*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
+ * If we're supposed to inject the backup manifest into the results,
+ * it should be done here, so that the file content can be injected
+ * directly, without worrying about the details of the tar format.
*/
+ if (inject_manifest)
+ manifest_inject_streamer = streamer;
- MemSet(zerobuf, 0, sizeof(zerobuf));
-
- if (state.basetablespace && writerecoveryconf)
+ /*
+ * If this is the main tablespace and we're supposed to write
+ * recovery information, arrange to do that.
+ */
+ if (spclocation == NULL && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ Assert(must_parse_archive);
+ streamer = bbstreamer_recovery_injector_new(streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ /*
+ * If we're doing anything that involves understanding the contents of
+ * the archive, we'll need to parse it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_parser_new(streamer);
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ /* Return the results. */
+ *manifest_inject_streamer_p = manifest_inject_streamer;
+ return streamer;
+}
- writeTarData(&state, header, sizeof(header));
+/*
+ * Receive raw tar data from the server, and stream it to the appropriate
+ * location. If we're writing a single tarfile to standard output, also
+ * receive the backup manifest and inject it into that tarfile.
+ */
+static void
+ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum)
+{
+ WriteTarState state;
+ bbstreamer *manifest_inject_streamer;
+ bool is_recovery_guc_supported;
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ /* Pass all COPY data through to the backup streamer. */
+ memset(&state, 0, sizeof(state));
+ is_recovery_guc_supported =
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ state.streamer = CreateBackupStreamer(archive_name, spclocation,
+ &manifest_inject_streamer,
+ is_recovery_guc_supported);
+ state.tablespacenum = tablespacenum;
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ progress_filename = NULL;
/*
- * Normally, we emit the backup manifest as a separate file, but when
- * we're writing a tarfile to stdout, we don't have that option, so
- * include it in the one tarfile we've got.
+ * The decision as to whether we need to inject the backup manifest into
+ * the output at this stage is made by CreateBackupStreamer; if that is
+ * needed, manifest_inject_streamer will be non-NULL; otherwise, it will
+ * be NULL.
*/
- if (strcmp(basedir, "-") == 0 && manifest)
+ if (manifest_inject_streamer != NULL)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
+ /* Slurp the entire backup manifest into a buffer. */
initPQExpBuffer(&buf);
ReceiveBackupManifestInMemory(conn, &buf);
if (PQExpBufferDataBroken(buf))
@@ -1250,42 +1127,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
- termPQExpBuffer(&buf);
- }
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
+ /* Inject it into the output tarfile. */
+ bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ buf.data, buf.len);
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
+ /* Free memory. */
+ termPQExpBuffer(&buf);
}
- progress_report(rownum, state.filename, true, false);
+ /* Cleanup. */
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+
+ progress_report(tablespacenum, true, false);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1301,184 +1156,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
-
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
+ bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
+ progress_report(state->tablespacenum, false, false);
}
@@ -1503,242 +1184,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true, false);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2031,16 +1476,32 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
+ /* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named base.tar
+ * if it's the main data directory or <tablespaceoid>.tar if it's for
+ * another tablespace. CreateBackupStreamer() will arrange to add .gz
+ * to the archive name if pg_basebackup is performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
+ }
/*
* Now receive backup manifest, if appropriate.
@@ -2056,7 +1517,10 @@ BaseBackup(void)
ReceiveBackupManifest(conn);
if (showprogress)
- progress_report(PQntuples(res), NULL, true, true);
+ {
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true, true);
+ }
PQclear(res);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49b119a6cb..b916f09165 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3769,3 +3769,13 @@ bbsink
bbsink_ops
bbsink_state
bbsink_throttle
+bbstreamer
+bbstreamer
+bbstreamer_archive_context
+bbstreamer_bzip_writer
+bbstreamer_member
+bbstreamer_ops
+bbstreamer_plain_writer
+bbstreamer_recovery_injector
+bbstreamer_tar_archiver
+bbstreamer_tar_parser
--
2.24.3 (Apple Git-128)
v4-0006-Support-base-backup-targets.patchapplication/octet-stream; name=v4-0006-Support-base-backup-targets.patchDownload
From e0136c35799e6f62e09103647826d6756064d43e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 14:56:52 -0400
Subject: [PATCH v4 6/7] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 301 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 197 +++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 556 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index aefa7cb17e..62f915e8b8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -253,14 +256,38 @@ perform_base_backup(basebackup_options *opt)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt->target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt->target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt->target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt->target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -711,6 +738,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -846,25 +875,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -876,6 +915,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 389a520417..9104455700 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -127,11 +130,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -204,8 +208,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -286,8 +294,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..dff930c3c9
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,301 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index 1606463291..d1927e4f81 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -121,7 +121,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index ef7e6bfb77..a910915ccd 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8221a8c9ac..c23cb2846f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -109,7 +109,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -126,6 +126,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -357,6 +358,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1221,15 +1224,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1301,24 +1311,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1683,7 +1701,33 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1778,8 +1822,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1790,7 +1839,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1873,7 +1923,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2007,8 +2057,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2030,7 +2083,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2064,6 +2117,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2114,7 +2168,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2155,6 +2209,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2287,18 +2344,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2308,6 +2397,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2324,6 +2423,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2357,8 +2459,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2379,6 +2491,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2386,6 +2499,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2395,6 +2511,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2437,11 +2556,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 2047d0fa7a..c074da9313 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,9 +261,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6007827b44..6af924b6d4 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 54c67982f5..eb44604d40 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3770,6 +3770,7 @@ backup_target_type
bbsink
bbsink_copystream
bbsink_ops
+bbsink_server
bbsink_state
bbsink_throttle
bbstreamer
--
2.24.3 (Apple Git-128)
v4-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patchapplication/octet-stream; name=v4-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patchDownload
From 6513a3b6de9f3885d6aea081c7fc05794854db2f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v4 2/7] Refactor basebackup.c's _tarWriteDir() function.
Sometimes, we replace a symbolic link that we find in the data
directory with an actual directory within the tarfile that we
create. _tarWriteDir was responsible both for making this
substitution and also for writing the tar header for the
resulting directory into the tar file. Make it do only the first
of those things, and rename to convert_link_to_directory.
Substantially larger refactoring of this source file is planned,
but this little bit seemed to make sense to commit
independently.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b0b52d3b1a..7d1ddd2f9f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -71,8 +71,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1371,7 +1370,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1387,7 +1388,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1399,7 +1402,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1873,12 +1878,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1887,8 +1891,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.3 (Apple Git-128)
v4-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v4-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload
From 0695c003f83baa335f89c9812c43c5fdd209ba00 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 11:45:50 -0400
Subject: [PATCH v4 3/7] Introduce 'bbsink' abstraction to modularize base
backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc', but in the future we might introduce
other options.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
---
src/backend/replication/Makefile | 4 +
src/backend/replication/backup_manifest.c | 28 +-
src/backend/replication/basebackup.c | 674 +++++-------------
src/backend/replication/basebackup_copy.c | 324 +++++++++
src/backend/replication/basebackup_progress.c | 250 +++++++
src/backend/replication/basebackup_sink.c | 115 +++
src/backend/replication/basebackup_throttle.c | 198 +++++
src/include/replication/backup_manifest.h | 5 +-
src/include/replication/basebackup_sink.h | 275 +++++++
src/tools/pgindent/typedefs.list | 4 +
10 files changed, 1363 insertions(+), 514 deletions(-)
create mode 100644 src/backend/replication/basebackup_copy.c
create mode 100644 src/backend/replication/basebackup_progress.c
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/backend/replication/basebackup_throttle.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..74b97cf126 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,10 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_copy.o \
+ basebackup_progress.o \
+ basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 04ca455ace..4fe11a3b5c 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -310,9 +311,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -352,38 +352,28 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
- *
- * We choose to read back the data from the temporary file in chunks of
- * size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
- * size, so it seems to make sense to match that value here.
+ * Send the backup manifest.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
- char manifestbuf[BLCKSZ];
size_t bytes_to_read;
size_t rc;
- bytes_to_read = Min(sizeof(manifestbuf),
+ bytes_to_read = Min(sink->bbs_buffer_length,
manifest->manifest_size - manifest_bytes_done);
- rc = BufFileRead(manifest->buffile, manifestbuf, bytes_to_read);
+ rc = BufFileRead(manifest->buffile, sink->bbs_buffer,
+ bytes_to_read);
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 7d1ddd2f9f..ecd32e8436 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -46,6 +43,16 @@
#include "utils/resowner.h"
#include "utils/timestamp.h"
+/*
+ * How much data do we want to send in one CopyData message? Note that
+ * this may also result in reading the underlying files in chunks of this
+ * size.
+ *
+ * NB: The buffer size is required to be a multiple of the system block
+ * size, so use that value instead if it's bigger than our preference.
+ */
+#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+
typedef struct
{
const char *label;
@@ -59,27 +66,25 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
+static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -90,46 +95,12 @@ static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
-/*
- * Size of each block sent into the tar stream for larger files.
- */
-#define TAR_SEND_SIZE 32768
-
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
-/* The starting XLOG position of the base backup. */
-static XLogRecPtr startptr;
-
/* Total number of checksum failures during base backup. */
static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -255,30 +226,29 @@ static const struct exclude_list_item noChecksumFiles[] = {
static void
perform_base_backup(basebackup_options *opt)
{
- TimeLineID starttli;
+ bbsink_state state;
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- List *tablespaces = NIL;
+ bbsink *sink = bbsink_copytblspc_new();
+ bbsink *progress_sink;
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ /* Initial backup state, insofar as we know it now. */
+ state.tablespaces = NIL;
+ state.tablespace_num = 0;
+ state.bytes_done = 0;
+ state.bytes_total = 0;
+ state.bytes_total_is_valid = false;
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -295,11 +265,11 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
- startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
- tblspc_map_file);
+ basebackup_progress_wait_checkpoint();
+ state.startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint,
+ &state.starttli,
+ labelfile, &state.tablespaces,
+ tblspc_map_file);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -312,7 +282,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -329,7 +298,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = -1;
- tablespaces = lappend(tablespaces, ti);
+ state.tablespaces = lappend(state.tablespaces, ti);
/*
* Calculate the total backup size by summing up the size of each
@@ -337,100 +306,53 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
- NULL);
+ tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
+ true, NULL, NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
+ state.bytes_total += tmp->size;
}
+ state.bytes_total_is_valid = true;
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
-
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, &state, SINK_BUFFER_LENGTH);
/* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
+ sendDir(sink, ".", 1, false, state.tablespaces,
+ sendtblspclinks, &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -438,32 +360,33 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ Assert(lnext(state.tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
+ bbsink_end_archive(sink);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -489,8 +412,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -501,7 +423,7 @@ perform_base_backup(basebackup_options *opt)
* shouldn't be such files, but if there are, there's little harm in
* including them.
*/
- XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLByteToSeg(state.startptr, startsegno, wal_segment_size);
XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
@@ -591,7 +513,6 @@ perform_base_backup(basebackup_options *opt)
{
char *walFileName = (char *) lfirst(lc);
int fd;
- char buf[TAR_SEND_SIZE];
size_t cnt;
pgoff_t len = 0;
@@ -630,22 +551,17 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
- while ((cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf),
+ while ((cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length,
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -674,7 +590,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +613,23 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
+ AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
+ endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -739,7 +655,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -951,155 +867,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", LSN_FORMAT_ARGS(ptr));
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
+ int bytes_done = 0,
len;
pg_checksum_context checksum_ctx;
@@ -1125,25 +901,23 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
- update_basebackup_progress(len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
+ elog(ERROR, "could not update checksum of file \"%s\"",
+ filename);
+
+ while (bytes_done < len)
{
- char buf[TAR_BLOCK_SIZE];
+ size_t remaining = len - bytes_done;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
+ memcpy(sink->bbs_buffer, content, nbytes);
+ bbsink_archive_contents(sink, nbytes);
+ bytes_done += nbytes;
}
- if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
- elog(ERROR, "could not update checksum of file \"%s\"",
- filename);
+ _tarWritePadding(sink, len);
AddFileToBackupManifest(manifest, NULL, filename, len,
(pg_time_t) statbuf.st_mtime, &checksum_ctx);
@@ -1157,7 +931,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1187,11 +961,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1210,8 +984,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1371,8 +1145,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1389,8 +1163,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1403,15 +1177,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1442,7 +1216,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1466,7 +1240,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1498,7 +1272,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1506,7 +1280,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1583,21 +1357,19 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
bool block_retry = false;
- char buf[TAR_SEND_SIZE];
uint16 checksum;
int checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
@@ -1618,7 +1390,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1659,9 +1431,11 @@ sendFile(const char *readfilename, const char *tarfilename,
*/
while (len < statbuf->st_size)
{
+ size_t remaining = statbuf->st_size - len;
+
/* Try to read some more data. */
- cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf), statbuf->st_size - len),
+ cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length, remaining),
len, readfilename, true);
/*
@@ -1678,7 +1452,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read a multiple of
* BLCKSZ bytes.
*/
- Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
if (verify_checksum && (cnt % BLCKSZ != 0))
{
@@ -1694,7 +1468,7 @@ sendFile(const char *readfilename, const char *tarfilename,
{
for (i = 0; i < cnt / BLCKSZ; i++)
{
- page = buf + BLCKSZ * i;
+ page = sink->bbs_buffer + BLCKSZ * i;
/*
* Only check pages which have not been modified since the
@@ -1704,7 +1478,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* this case. We also skip completely new pages, since they
* don't have a checksum yet.
*/
- if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+ if (!PageIsNew(page) && PageGetLSN(page) < sink->bbs_state->startptr)
{
checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
phdr = (PageHeader) page;
@@ -1726,7 +1500,8 @@ sendFile(const char *readfilename, const char *tarfilename,
/* Reread the failed block */
reread_cnt =
- basebackup_read_file(fd, buf + BLCKSZ * i,
+ basebackup_read_file(fd,
+ sink->bbs_buffer + BLCKSZ * i,
BLCKSZ, len + BLCKSZ * i,
readfilename,
false);
@@ -1773,34 +1548,29 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
/* Also feed it to the checksum machinery. */
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer, cnt) < 0)
elog(ERROR, "could not update checksum of base backup");
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
- if (len < statbuf->st_size)
+ while (len < statbuf->st_size)
{
- MemSet(buf, 0, sizeof(buf));
- while (len < statbuf->st_size)
- {
- cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
- elog(ERROR, "could not update checksum of base backup");
- update_basebackup_progress(cnt);
- len += cnt;
- throttle(cnt);
- }
+ size_t remaining = statbuf->st_size - len;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
+
+ MemSet(sink->bbs_buffer, 0, nbytes);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ nbytes) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ bbsink_archive_contents(sink, nbytes);
+ len += nbytes;
}
/*
@@ -1808,13 +1578,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
- }
+ _tarWritePadding(sink, len);
CloseTransientFile(fd);
@@ -1837,18 +1601,28 @@ sendFile(const char *readfilename, const char *tarfilename,
return true;
}
-
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[TAR_BLOCK_SIZE];
enum tarError rc;
if (!sizeonly)
{
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ /*
+ * As of this writing, the smallest supported block size is 1kB, which
+ * is twice TAR_BLOCK_SIZE. Since the buffer size is required to be a
+ * multiple of BLCKSZ, it should be safe to assume that the buffer is
+ * large enough to fit an entire tar block. We double-check by means of
+ * these assertions.
+ */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= BLCKSZ,
+ "BLCKSZ too small for tar block");
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ rc = tarCreateHeader(sink->bbs_buffer, filename, linktarget,
+ statbuf->st_size, statbuf->st_mode,
+ statbuf->st_uid, statbuf->st_gid,
statbuf->st_mtime);
switch (rc)
@@ -1870,134 +1644,48 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
- update_basebackup_progress(sizeof(h));
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
}
- return sizeof(h);
-}
-
-/*
- * If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
- */
-static void
-convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
-{
- /* If symlink, write it as a directory anyway */
-#ifndef WIN32
- if (S_ISLNK(statbuf->st_mode))
-#else
- if (pgwin32_is_junction(pathbuf))
-#endif
- statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
+ return TAR_BLOCK_SIZE;
}
/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
+ * Pad with zero bytes out to a multiple of TAR_BLOCK_SIZE.
*/
static void
-throttle(size_t increment)
+_tarWritePadding(bbsink *sink, int len)
{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
+ int pad = tarPaddingBytesRequired(len);
/*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
+ * As in _tarWriteHeader, it should be safe to assume that the buffer is
+ * large enough that we don't need to do this in multiple chunks.
*/
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+ Assert(pad <= TAR_BLOCK_SIZE);
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
-
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
+ if (pad > 0)
+ {
+ MemSet(sink->bbs_buffer, 0, pad);
+ bbsink_archive_contents(sink, pad);
}
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
}
/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
static void
-update_basebackup_progress(int64 delta)
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
+ /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+ if (S_ISLNK(statbuf->st_mode))
+#else
+ if (pgwin32_is_junction(pathbuf))
+#endif
+ statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
/*
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
new file mode 100644
index 0000000000..564f010188
--- /dev/null
+++ b/src/backend/replication/basebackup_copy.c
@@ -0,0 +1,324 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_copy.c
+ * send basebackup archives using one COPY OUT operation per
+ * tablespace, and an additional COPY OUT for the backup manifest
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_copy.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_copytblspc_begin_backup(bbsink *sink);
+static void bbsink_copytblspc_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copytblspc_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_archive(bbsink *sink);
+static void bbsink_copytblspc_begin_manifest(bbsink *sink);
+static void bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_manifest(bbsink *sink);
+static void bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendTablespaceList(List *tablespaces);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+
+const bbsink_ops bbsink_copytblspc_ops = {
+ .begin_backup = bbsink_copytblspc_begin_backup,
+ .begin_archive = bbsink_copytblspc_begin_archive,
+ .archive_contents = bbsink_copytblspc_archive_contents,
+ .end_archive = bbsink_copytblspc_end_archive,
+ .begin_manifest = bbsink_copytblspc_begin_manifest,
+ .manifest_contents = bbsink_copytblspc_manifest_contents,
+ .end_manifest = bbsink_copytblspc_end_manifest,
+ .end_backup = bbsink_copytblspc_end_backup
+};
+
+/*
+ * Create a new 'copytblspc' bbsink.
+ */
+bbsink *
+bbsink_copytblspc_new(void)
+{
+ bbsink *sink = palloc0(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_copytblspc_ops;
+
+ return sink;
+}
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_copytblspc_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ /* Create a suitable buffer. */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_copytblspc_archive_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", LSN_FORMAT_ARGS(ptr));
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a result set via libpq describing the tablespace list.
+ */
+static void
+SendTablespaceList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..79f4d9dea3
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress tracking, including but not
+ * limited to command progress reporting.
+ *
+ * This should be used even if the PROGRESS option to the replication
+ * command BASE_BACKUP is not specified. Without that option, we won't
+ * have tallied up the size of the files that are going to need to be
+ * backed up, but we can still report to the command progress reporting
+ * facility how much data we've processed.
+ *
+ * Moreover, we also use this as a convenient place to update certain
+ * fields of the bbsink_state. That work is accurately described as
+ * keeping track of our progress, but it's not just for introspection.
+ * We need those fields to be updated properly in order for base backups
+ * to work.
+ *
+ * This particular basebackup sink requires extra callbacks that most base
+ * backup sinks don't. Rather than cramming those into the interface, we just
+ * have a few extra functions here that basebackup.c can call. (We could put
+ * the logic directly into that file as it's fairly simple, but it seems
+ * cleaner to have everything related to progress reporting in one place.)
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+static void bbsink_progress_begin_backup(bbsink *sink);
+static void bbsink_progress_archive_contents(bbsink *sink, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress tracking functions and
+ * forwards data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink));
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_progress_ops;
+ sink->bbs_next = next;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of the
+ * backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL, -1);
+
+ return sink;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+
+ /*
+ * Report that we are now streaming database files as a base backup. Also
+ * advertise the number of tablespaces, and, if known, the estimated total
+ * backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ if (sink->bbs_state->bytes_total_is_valid)
+ val[1] = sink->bbs_state->bytes_total;
+ else
+ val[1] = -1;
+ val[2] = list_length(sink->bbs_state->tablespaces);
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ bbsink_forward_begin_backup(sink);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ /*
+ * We expect one archive per tablespace, so reaching the end of an archive
+ * also means reaching the end of a tablespace. (Some day we might have a
+ * reason to decouple these concepts.)
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (sink->bbs_state->tablespace_num < list_length(sink->bbs_state->tablespaces))
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ sink->bbs_state->tablespace_num + 1);
+
+ /* Delegate to next sink. */
+ bbsink_forward_end_archive(sink);
+
+ /*
+ * This is a convenient place to update the bbsink_state's notion of which
+ * is the current tablespace. Note that the bbsink_state object is shared
+ * across all bbsink objects involved, but we're the outermost one and
+ * this is the very last thing we do.
+ */
+ sink->bbs_state->tablespace_num++;
+}
+
+/*
+ * Handle progress tracking for new archive contents.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+
+ /* First update bbsink_state with # of bytes done. */
+ state->bytes_done += len;
+
+ /* Now forward to next sink. */
+ bbsink_forward_archive_contents(sink, len);
+
+ /* Prepare to set # of bytes done for command progress reporting. */
+ val[nparam++] = state->bytes_done;
+
+ /*
+ * We may also want to update # of total bytes, to avoid overflowing past
+ * 100% or the full size. This may make the total size number change as we
+ * approach the end of the backup (the estimate will always be wrong if
+ * WAL is included), but that's better than having the done column be
+ * bigger than the total.
+ */
+ if (state->bytes_total_is_valid && state->bytes_done > state->bytes_total)
+ val[nparam++] = state->bytes_done;
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+ Assert(state->tablespace_num >= list_length(state->tablespaces) - 1);
+ Assert(state->tablespace_num <= list_length(state->tablespaces));
+
+ /*
+ * We report having finished all tablespaces at this point, even if the
+ * archive for the main tablespace is still open, because what's going to
+ * be added is WAL files, not files that are really from the main
+ * tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = list_length(state->tablespaces);
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..14104f50e8
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,115 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+/*
+ * Forward begin_backup callback.
+ *
+ * Only use this implementation if you want the bbsink you're implementing to
+ * share a buffer with the succesor bbsink.
+ */
+void
+bbsink_forward_begin_backup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_state != NULL);
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+ sink->bbs_buffer = sink->bbs_next->bbs_buffer;
+}
+
+/*
+ * Forward begin_archive callback.
+ */
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+/*
+ * Forward archive_contents callback.
+ *
+ * Code that wants to use this should initalize its own bbs_buffer and
+ * bbs_buffer_length fields to the values from the successor sink. In cases
+ * where the buffer isn't shared, the data needs to be copied before forwarding
+ * the callback. We don't do try to do that here, because there's really no
+ * reason to have separately allocated buffers containing the same identical
+ * data.
+ */
+void
+bbsink_forward_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_archive_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_archive callback.
+ */
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * Forward begin_manifest callback.
+ */
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward manifest_contents callback.
+ *
+ * As with the archive_contents callback, it's expected that the buffer is
+ * shared.
+ */
+void
+bbsink_forward_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_manifest callback.
+ */
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward end_backup callback.
+ */
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..1606463291
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink);
+static void bbsink_throttle_archive_contents(bbsink *sink, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ bbsink_forward_begin_backup(sink);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 099108910c..16ed7eec9b 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,7 +47,8 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
extern void FreeBackupManifest(backup_manifest_info *manifest);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..3a2206d82f
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,275 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * Taking a base backup produces one archive per tablespace directory,
+ * plus a backup manifest unless that feature has been disabled. The
+ * goal of the backup process is to put those archives and that manifest
+ * someplace, possibly after postprocessing them in some way. A 'bbsink'
+ * is an object to which those archives, and the manifest if present,
+ * can be sent.
+ *
+ * In practice, there will be a chain of 'bbsink' objects rather than
+ * just one, with callbacks being forwarded from one to the next,
+ * possibly with modification. Each object is responsible for a
+ * single task e.g. command progress reporting, throttling, or
+ * communication with the client.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Overall backup state shared by all bbsink objects for a backup.
+ *
+ * Before calling bbstate_begin_backup, caller must initiate a bbsink_state
+ * object which will last for the lifetime of the backup, and must thereafter
+ * update it as required before each new call to a bbsink method. The bbsink
+ * will retain a pointer to the state object and will consult it to understand
+ * the progress of the backup.
+ *
+ * 'tablespaces' is a list of tablespaceinfo objects. It must be set before
+ * calling bbstate_begin_backup() and must not be modified thereafter.
+ *
+ * 'tablespace_num' is the index of the current tablespace within the list
+ * stored in 'tablespaces'.
+ *
+ * 'bytes_done' is the number of bytes read so far from $PGDATA.
+ *
+ * 'bytes_total' is the total number of bytes estimated to be present in
+ * $PGDATA, if we have estimated this.
+ *
+ * 'bytes_total_is_valid' is true if and only if a proper estimate has been
+ * stored into 'bytes_total'.
+ *
+ * 'startptr' and 'starttli' identify the point in the WAL stream at which
+ * the backup began. They must be set before calling bbstate_begin_backup()
+ * and must not be modified thereafter.
+ */
+typedef struct bbsink_state
+{
+ List *tablespaces;
+ int tablespace_num;
+ uint64 bytes_done;
+ uint64 bytes_total;
+ bool bytes_total_is_valid;
+ XLogRecPtr startptr;
+ TimeLineID starttli;
+} bbsink_state;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
+ *
+ * 'bbs_next' is a pointer to another bbsink to which this bbsink is
+ * forwarding some or all operations.
+ *
+ * 'bbs_state' is a pointer to the bbsink_state object for this backup.
+ * Every bbsink associated with this backup should point to the same
+ * underlying state object.
+ *
+ * In general it is expected that the values of these fields are set when
+ * a bbsink is created and that they do not change thereafter. It's OK
+ * to modify the data to which bbs_buffer or bbs_state point, but no changes
+ * should be made to the contents of this struct.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ char *bbs_buffer;
+ int bbs_buffer_length;
+ bbsink *bbs_next;
+ bbsink_state *bbs_state;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline
+ * functions rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /*
+ * This callback is invoked just once, at the very start of the backup.
+ * It must set bbs_buffer to point to a chunk of storage where at least
+ * bbs_buffer_length bytes of data can be written.
+ */
+ void (*begin_backup) (bbsink *sink);
+
+ /*
+ * For each archive transmitted to a bbsink, there will be one call to the
+ * begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ *
+ * Before invoking the archive_contents() callback, the caller should copy
+ * a number of bytes equal to what will be passed as len into bbs_buffer,
+ * but not more than bbs_buffer_length.
+ *
+ * It's generally good if the buffer is as full as possible before the
+ * archive_contents() callback is invoked, but it's not worth expending
+ * extra cycles to make sure it's absolutely 100% full.
+ */
+ void (*begin_archive) (bbsink *sink, const char *archive_name);
+ void (*archive_contents) (bbsink *sink, size_t len);
+ void (*end_archive) (bbsink *sink);
+
+ /*
+ * If a backup manifest is to be transmitted to a bbsink, there will be
+ * one call to the begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback. These calls will occur after all archives are transmitted.
+ *
+ * The rules for invoking the manifest_contents() callback are the same as
+ * for the archive_contents() callback above.
+ */
+ void (*begin_manifest) (bbsink *sink);
+ void (*manifest_contents) (bbsink *sink, size_t len);
+ void (*end_manifest) (bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup) (bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, bbsink_state *state, int buffer_length)
+{
+ Assert(sink != NULL);
+
+ Assert(buffer_length > 0);
+
+ sink->bbs_state = state;
+ sink->bbs_buffer_length = buffer_length;
+ sink->bbs_ops->begin_backup(sink);
+
+ Assert(sink->bbs_buffer != NULL);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /*
+ * The caller should make a reasonable attempt to fill the buffer before
+ * calling this function, so it shouldn't be completely empty. Nor should
+ * it be filled beyond capacity.
+ */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->archive_contents(sink, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /* See comments in bbsink_archive_contents. */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->manifest_contents(sink, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ Assert(sink->bbs_state->tablespace_num == list_length(sink->bbs_state->tablespaces));
+
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 423780652f..49b119a6cb 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3765,3 +3765,7 @@ yyscan_t
z_stream
z_streamp
zic_t
+bbsink
+bbsink_ops
+bbsink_state
+bbsink_throttle
--
2.24.3 (Apple Git-128)
Thanks, Robert for your response.
On Thu, Sep 9, 2021 at 1:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 8, 2021 at 2:14 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:To give an example, I put some logging statements, and I can see in the
log:
"
bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
input size to be compressed: 512
estimated size for compressed buffer by LZ4F_compressBound(): 262667
actual compressed size: 16
"That is pretty lame. I don't know why it needs a ~256k buffer to
produce 16 bytes of output.
As I mentioned earlier, I think it has something to do with the lz4
blocksize. Currently, I have chosen it has 256kB, which is 262144 bytes,
and here the LZ4F_compressBound() has returned 262667 for worst-case
accommodation of 512 bytes i.e. 262144(256kB) + 512 + I guess some
book-keeping bytes. If I choose to have blocksize as 64K, then this turns
out to be: 66059 which is 65536(64 kB) + 512 + bookkeeping bytes.
The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer. So if the LZ4 buffer is told that its buffer should
be at least, I don't know, say 64kB. Then it can compute how large an
output buffer the LZ4 library requires for 64kB. Hopefully we can
assume that liblz4 never needs a smaller buffer for a larger input.
Then we can assume that if a 64kB input requires, say, a 300kB output
buffer, every possible input < 64kB also requires an output buffer <=
300 kB.
I agree, this assumption is fair enough.
But we can't just say, well, we were asked to create a 64kB buffer (or
whatever) so let's ask the next bbsink for a 300kB buffer (or
whatever), because then as soon as we write any data at all into it
the remaining buffer space might be insufficient for the next chunk.
So instead what I think we should do is have bbsink_lz4 set the size
of the next sink's buffer to its own buffer size +
LZ4F_compressBound(its own buffer size). So in this example if it's
asked to create a 64kB buffer and LZ4F_compressBound(64kB) = 300kB
then it asks the next sink to set the buffer size to 364kB. Now, that
means that there will always be at least 300 kB available in the
output buffer until we've accumulated a minimum of 64 kB of compressed
data, and then at that point we can flush.
I think this would be relatively clean and would avoid the need for
the double copying that the current design forced you to do. What do
you think?
I think this should work.
+ /* + * If we do not have enough space left in the output buffer for this + * chunk to be written, first archive the already written contents. + */ + if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written || + mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length) + { + bbsink_archive_contents(sink->bbs_next, mysink->bytes_written); + mysink->bytes_written = 0; + }I think this is flat-out wrong. It assumes that the compressor will
never generate more than N bytes of output given N bytes of input,
which is not true. Not sure there's much point in fixing it now
because with the changes described above this code will have to change
anyway, but I think it's just lucky that this has worked for you in
your testing.
I see your point. But for it to be accurate, I think we need to then
considered the return value of LZ4F_compressBound() to check if that
many bytes are available. But, as explained earlier our output buffer is
already way smaller than that.
+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output + * buffer. We need to keep track of how many bytes have been cumulatively + * written into the output buffer(bytes_written). But, + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not + * written to output buffer, set autoFlush to 1 to force the writing to the + * output buffer. + */ + prefs->autoFlush = 1;I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?
This is needed to know the actual bytes written in the output buffer. If it
is
set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
bytes are written to the output buffer, depending on whether it has buffered
or really flushed data to the output buffer.
IIUC, you are referring to the following comment for
bbsink_lz4_end_archive():
"
* There might be some data inside lz4's internal buffers; we need to get
* that flushed out, also finalize the lz4 frame and then get that forwarded
* to the successor sink as archive content.
"
I think it should be modified to:
"
* Finalize the lz4 frame and then get that forwarded to the successor sink
as
* archive content.
"
Regards,
Jeevan Ladhe.
On Fri, Sep 10, 2021 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 8, 2021 at 3:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer.Here's a new patch set with that design change (and a bug fix for 0001).
Seems like nothing has been done about the issue reported in [1]/messages/by-id/CAFiTN-uhg4iKA7FGWxaG9J8WD_LTx655+AUW3_KiK1=SakQy4A@mail.gmail.com
This one line change shall fix the issue,
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -264,6 +264,8 @@ bbsink_gzip_end_archive(bbsink *sink)
bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
mysink->bytes_written = 0;
}
+
+ bbsink_forward_end_archive(sink);
}
[1]: /messages/by-id/CAFiTN-uhg4iKA7FGWxaG9J8WD_LTx655+AUW3_KiK1=SakQy4A@mail.gmail.com
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 13, 2021 at 6:03 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
+ /* + * If we do not have enough space left in the output buffer for this + * chunk to be written, first archive the already written contents. + */ + if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written || + mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length) + { + bbsink_archive_contents(sink->bbs_next, mysink->bytes_written); + mysink->bytes_written = 0; + }I think this is flat-out wrong. It assumes that the compressor will
never generate more than N bytes of output given N bytes of input,
which is not true. Not sure there's much point in fixing it now
because with the changes described above this code will have to change
anyway, but I think it's just lucky that this has worked for you in
your testing.I see your point. But for it to be accurate, I think we need to then
considered the return value of LZ4F_compressBound() to check if that
many bytes are available. But, as explained earlier our output buffer is
already way smaller than that.
Well, in your last version of the patch, you kind of had two output
buffers: a bigger one that you use internally and then the "official"
one which is associated with the next sink. With my latest patch set
you should be able to make that go away by just arranging for the next
sink's buffer to be as big as you need it to be. But, if we were going
to stick with using an extra buffer, then the solution would not be to
do this, but to copy the internal buffer to the official buffer in
multiple chunks if needed. So don't bother doing this here but just
wait and see how much data you get and then chunk it to the next
sink's buffer, calling bbsink_archive_contents() multiple times if
required. That would be annoying and expensive so I'm glad we're not
doing it that way, but it could be done correctly.
+ /* + * LZ4F_compressUpdate() returns the number of bytes written into output + * buffer. We need to keep track of how many bytes have been cumulatively + * written into the output buffer(bytes_written). But, + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not + * written to output buffer, set autoFlush to 1 to force the writing to the + * output buffer. + */ + prefs->autoFlush = 1;I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?This is needed to know the actual bytes written in the output buffer. If it is
set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
bytes are written to the output buffer, depending on whether it has buffered
or really flushed data to the output buffer.
The problem is that if we autoflush, I think it will cause the
compression ratio to be less good. Try un-lz4ing a file that is
produced this way and then re-lz4 it and compare the size of the
re-lz4'd file to the original one. Compressors rely on postponing
decisions about how to compress until they've seen as much of the
input as possible, and flushing forces them to decide earlier, and
maybe making a decision that isn't as good as it could have been. So I
believe we should look for a way of avoiding this. Now I realize
there's a problem there with doing that and also making sure the
output buffer is large enough, and I'm not quite sure how we solve
that problem, but there is probably a way to do it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems like nothing has been done about the issue reported in [1]
This one line change shall fix the issue,
Oops. Try this version.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v5-0006-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v5-0006-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From 5d62beebe4135a6d3b7c57c19190a0d564a84ef7 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 9 Sep 2021 14:53:04 -0400
Subject: [PATCH v5 6/8] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to suppor the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 62 ++-
src/backend/replication/basebackup_copy.c | 266 ++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
src/tools/pgindent/typedefs.list | 3 +
5 files changed, 722 insertions(+), 53 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ecd32e8436..aefa7cb17e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -81,6 +88,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -233,7 +241,7 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- bbsink *sink = bbsink_copytblspc_new();
+ bbsink *sink;
bbsink *progress_sink;
/* Initial backup state, insofar as we know it now. */
@@ -243,6 +251,16 @@ perform_base_backup(basebackup_options *opt)
state.bytes_total = 0;
state.bytes_total_is_valid = false;
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -383,7 +401,10 @@ perform_base_backup(basebackup_options *opt)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(progress_sink);
@@ -621,6 +642,7 @@ perform_base_backup(basebackup_options *opt)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -688,8 +710,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -820,6 +844,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -1672,6 +1712,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 564f010188..389a520417 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,51 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -37,6 +101,17 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -48,6 +123,193 @@ const bbsink_ops bbsink_copytblspc_ops = {
.end_backup = bbsink_copytblspc_end_backup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins
+ * with a the type byte we're going to need, and then arrange things so
+ * that the data we're given will be written just after that type byte.
+ * That will allow us to ship the data with a single call to pq_putmessage
+ * and without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 947a182e86..8221a8c9ac 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -167,6 +177,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -983,10 +1000,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1013,8 +1031,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1054,16 +1072,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1074,8 +1092,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1085,6 +1103,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1332,28 +1661,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1476,46 +1809,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 3a2206d82f..2047d0fa7a 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,6 +261,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b916f09165..54c67982f5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3765,7 +3765,10 @@ yyscan_t
z_stream
z_streamp
zic_t
+ArchiveStreamState
+backup_target_type
bbsink
+bbsink_copystream
bbsink_ops
bbsink_state
bbsink_throttle
--
2.24.3 (Apple Git-128)
v5-0001-Flexible-options-for-BASE_BACKUP.patchapplication/octet-stream; name=v5-0001-Flexible-options-for-BASE_BACKUP.patchDownload
From bd36c10d4413d0e2be4f3bab3a1de1a9736886c6 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 10 Sep 2021 11:50:05 -0400
Subject: [PATCH v5 1/8] Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
---
doc/src/sgml/protocol.sgml | 68 ++++++++++++--------
src/backend/replication/basebackup.c | 33 +++++-----
src/backend/replication/repl_gram.y | 93 +++++++++++++++++++++++----
src/bin/pg_basebackup/pg_basebackup.c | 65 ++++++++++++-------
src/bin/pg_basebackup/streamutil.c | 61 ++++++++++++++++++
src/bin/pg_basebackup/streamutil.h | 12 ++++
6 files changed, 254 insertions(+), 78 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..32d1eeabdc 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2517,8 +2517,7 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
- <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [ <literal>MANIFEST_CHECKSUMS</literal> <replaceable>checksum_algorithm</replaceable> ]
- <indexterm><primary>BASE_BACKUP</primary></indexterm>
+ <term><literal>BASE_BACKUP</literal> [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
</term>
<listitem>
<para>
@@ -2540,52 +2539,55 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>PROGRESS</literal></term>
+ <term><literal>PROGRESS [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Request information required to generate a progress report. This will
- send back an approximate size in the header of each tablespace, which
- can be used to calculate how far along the stream is done. This is
- calculated by enumerating all the file sizes once before the transfer
- is even started, and might as such have a negative impact on the
- performance. In particular, it might take longer before the first data
+ If set to true, request information required to generate a progress
+ report. This will send back an approximate size in the header of each
+ tablespace, which can be used to calculate how far along the stream
+ is done. This is calculated by enumerating all the file sizes once
+ before the transfer is even started, and might as such have a
+ negative impact on the performance. In particular, it might take
+ longer before the first data
is streamed. Since the database files can change during the backup,
the size is only approximate and might both grow and shrink between
the time of approximation and the sending of the actual files.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>FAST</literal></term>
+ <term><literal>FAST [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Request a fast checkpoint.
+ If set to true, a fast checkpoint is requested.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>WAL</literal></term>
+ <term><literal>WAL [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Include the necessary WAL segments in the backup. This will include
- all the files between start and stop backup in the
+ If set to true, include the necessary WAL segments in the backup.
+ This will include all the files between start and stop backup in the
<filename>pg_wal</filename> directory of the base directory tar
- file.
+ file. The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>NOWAIT</literal></term>
+ <term><literal>WAIT [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- By default, the backup will wait until the last required WAL
+ If set to true, the backup will wait until the last required WAL
segment has been archived, or emit a warning if log archiving is
- not enabled. Specifying <literal>NOWAIT</literal> disables both
- the waiting and the warning, leaving the client responsible for
- ensuring the required log is available.
+ not enabled. If false, the backup will neither wait nor warn,
+ leaving the client responsible for ensuring the required log is
+ available. The default is true.
</para>
</listitem>
</varlistentry>
@@ -2605,25 +2607,25 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>TABLESPACE_MAP</literal></term>
+ <term><literal>TABLESPACE_MAP [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Include information about symbolic links present in the directory
- <filename>pg_tblspc</filename> in a file named
+ If true, include information about symbolic links present in the
+ directory <filename>pg_tblspc</filename> in a file named
<filename>tablespace_map</filename>. The tablespace map file includes
each symbolic link name as it exists in the directory
<filename>pg_tblspc/</filename> and the full path of that symbolic link.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <term><literal>VERIFY_CHECKSUMS [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- By default, checksums are verified during a base backup if they are
- enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
- this verification.
+ If true, checksums are verified during a base backup if they are
+ enabled. If false, this is skipped. The default is true.
</para>
</listitem>
</varlistentry>
@@ -2708,6 +2710,7 @@ The commands accepted in replication mode are:
</varlistentry>
</variablelist>
</para>
+
<para>
After the second regular result set, one or more CopyOutResponse results
will be sent, one for the main data directory and one for each additional tablespace other
@@ -2788,6 +2791,17 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [ <literal>MANIFEST_CHECKSUMS</literal> <replaceable>checksum_algorithm</replaceable> ]
+ </term>
+ <listitem>
+ <para>
+ For compatibility with older releases, this alternative syntax for
+ the <literal>BASE_BACKUP</literal> command is still supported.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e09108d0ec..b0b52d3b1a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "catalog/pg_type.h"
#include "common/file_perm.h"
+#include "commands/defrem.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "libpq/libpq.h"
@@ -787,7 +788,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->label = strVal(defel->arg);
+ opt->label = defGetString(defel);
o_label = true;
}
else if (strcmp(defel->defname, "progress") == 0)
@@ -796,7 +797,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->progress = true;
+ opt->progress = defGetBoolean(defel);
o_progress = true;
}
else if (strcmp(defel->defname, "fast") == 0)
@@ -805,16 +806,16 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->fastcheckpoint = true;
+ opt->fastcheckpoint = defGetBoolean(defel);
o_fast = true;
}
- else if (strcmp(defel->defname, "nowait") == 0)
+ else if (strcmp(defel->defname, "wait") == 0)
{
if (o_nowait)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->nowait = true;
+ opt->nowait = !defGetBoolean(defel);
o_nowait = true;
}
else if (strcmp(defel->defname, "wal") == 0)
@@ -823,19 +824,19 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->includewal = true;
+ opt->includewal = defGetBoolean(defel);
o_wal = true;
}
else if (strcmp(defel->defname, "max_rate") == 0)
{
- long maxrate;
+ int64 maxrate;
if (o_maxrate)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- maxrate = intVal(defel->arg);
+ maxrate = defGetInt64(defel);
if (maxrate < MAX_RATE_LOWER || maxrate > MAX_RATE_UPPER)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
@@ -851,21 +852,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->sendtblspcmapfile = true;
+ opt->sendtblspcmapfile = defGetBoolean(defel);
o_tablespace_map = true;
}
- else if (strcmp(defel->defname, "noverify_checksums") == 0)
+ else if (strcmp(defel->defname, "verify_checksums") == 0)
{
if (o_noverify_checksums)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- noverify_checksums = true;
+ noverify_checksums = !defGetBoolean(defel);
o_noverify_checksums = true;
}
else if (strcmp(defel->defname, "manifest") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
bool manifest_bool;
if (o_manifest)
@@ -890,7 +891,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "manifest_checksums") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
if (o_manifest_checksums)
ereport(ERROR,
@@ -905,8 +906,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
o_manifest_checksums = true;
}
else
- elog(ERROR, "option \"%s\" not recognized",
- defel->defname);
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname));
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..ce51a5e322 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -95,13 +95,13 @@ static SQLCmd *make_sqlcmd(void);
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_legacy_opt_list generic_option_list
+%type <defelt> base_backup_legacy_opt generic_option
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
%type <node> plugin_opt_arg
-%type <str> opt_slot var_name
+%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
@@ -157,12 +157,24 @@ var_name: IDENT { $$ = $1; }
;
/*
+ * BASE_BACKUP ( option [ 'value' ] [, ...] )
+ *
+ * We also still support the legacy syntax:
+ *
* BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
* [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
* [MANIFEST %s] [MANIFEST_CHECKSUMS %s]
+ *
+ * Future options should be supported only using the new syntax.
*/
base_backup:
- K_BASE_BACKUP base_backup_opt_list
+ K_BASE_BACKUP '(' generic_option_list ')'
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ $$ = (Node *) cmd;
+ }
+ | K_BASE_BACKUP base_backup_legacy_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
@@ -170,14 +182,14 @@ base_backup:
}
;
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
+base_backup_legacy_opt_list:
+ base_backup_legacy_opt_list base_backup_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-base_backup_opt:
+base_backup_legacy_opt:
K_LABEL SCONST
{
$$ = makeDefElem("label",
@@ -200,8 +212,8 @@ base_backup_opt:
}
| K_NOWAIT
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("wait",
+ (Node *)makeInteger(false), -1);
}
| K_MAX_RATE UCONST
{
@@ -215,8 +227,8 @@ base_backup_opt:
}
| K_NOVERIFY_CHECKSUMS
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("verify_checksums",
+ (Node *)makeInteger(false), -1);
}
| K_MANIFEST SCONST
{
@@ -422,6 +434,65 @@ plugin_opt_arg:
sql_cmd:
IDENT { $$ = (Node *) make_sqlcmd(); }
;
+
+generic_option_list:
+ generic_option_list ',' generic_option
+ { $$ = lappend($1, $3); }
+ | generic_option
+ { $$ = list_make1($1); }
+ ;
+
+generic_option:
+ ident_or_keyword
+ {
+ $$ = makeDefElem($1, NULL, -1);
+ }
+ | ident_or_keyword IDENT
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword SCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword UCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeInteger($2), -1);
+ }
+ ;
+
+ident_or_keyword:
+ IDENT { $$ = $1; }
+ | K_BASE_BACKUP { $$ = "base_backup"; }
+ | K_IDENTIFY_SYSTEM { $$ = "identify_system"; }
+ | K_SHOW { $$ = "show"; }
+ | K_START_REPLICATION { $$ = "start_replication"; }
+ | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
+ | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
+ | K_TIMELINE_HISTORY { $$ = "timeline_history"; }
+ | K_LABEL { $$ = "label"; }
+ | K_PROGRESS { $$ = "progress"; }
+ | K_FAST { $$ = "fast"; }
+ | K_WAIT { $$ = "wait"; }
+ | K_NOWAIT { $$ = "nowait"; }
+ | K_MAX_RATE { $$ = "max_rate"; }
+ | K_WAL { $$ = "wal"; }
+ | K_TABLESPACE_MAP { $$ = "tablespace_map"; }
+ | K_NOVERIFY_CHECKSUMS { $$ = "noverify_checksums"; }
+ | K_TIMELINE { $$ = "timeline"; }
+ | K_PHYSICAL { $$ = "physical"; }
+ | K_LOGICAL { $$ = "logical"; }
+ | K_SLOT { $$ = "slot"; }
+ | K_RESERVE_WAL { $$ = "reserve_wal"; }
+ | K_TEMPORARY { $$ = "temporary"; }
+ | K_TWO_PHASE { $$ = "two_phase"; }
+ | K_EXPORT_SNAPSHOT { $$ = "export_snapshot"; }
+ | K_NOEXPORT_SNAPSHOT { $$ = "noexport_snapshot"; }
+ | K_USE_SNAPSHOT { $$ = "use_snapshot"; }
+ | K_MANIFEST { $$ = "manifest"; }
+ | K_MANIFEST_CHECKSUMS { $$ = "manifest_checksums"; }
+ ;
+
%%
static SQLCmd *
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7296eb97d0..0c8be22558 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1809,10 +1809,6 @@ BaseBackup(void)
TimeLineID latesttli;
TimeLineID starttli;
char *basebkp;
- char escaped_label[MAXPGPATH];
- char *maxrate_clause = NULL;
- char *manifest_clause = NULL;
- char *manifest_checksums_clause = "";
int i;
char xlogstart[64];
char xlogend[64];
@@ -1821,8 +1817,11 @@ BaseBackup(void)
int serverVersion,
serverMajor;
int writing_to_stdout;
+ bool use_new_option_syntax = false;
+ PQExpBufferData buf;
Assert(conn != NULL);
+ initPQExpBuffer(&buf);
/*
* Check server version. BASE_BACKUP command was introduced in 9.1, so we
@@ -1840,6 +1839,8 @@ BaseBackup(void)
serverver ? serverver : "'unknown'");
exit(1);
}
+ if (serverMajor >= 1500)
+ use_new_option_syntax = true;
/*
* If WAL streaming was requested, also check that the server is new
@@ -1870,20 +1871,42 @@ BaseBackup(void)
/*
* Start the actual backup
*/
- PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);
-
+ AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
+ if (estimatesize)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");
+ if (includewal == FETCH_WAL)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");
+ if (fastcheckpoint)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "FAST");
+ if (includewal != NO_WAL)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "WAIT", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "NOWAIT");
+ }
if (maxrate > 0)
- maxrate_clause = psprintf("MAX_RATE %u", maxrate);
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
+ maxrate);
+ if (format == 't')
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+ if (!verify_checksums)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax,
+ "VERIFY_CHECKSUMS", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax,
+ "NOVERIFY_CHECKSUMS");
+ }
if (manifest)
{
- if (manifest_force_encode)
- manifest_clause = "MANIFEST 'force-encode'";
- else
- manifest_clause = "MANIFEST 'yes'";
+ AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
- manifest_checksums_clause = psprintf("MANIFEST_CHECKSUMS '%s'",
- manifest_checksums);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
if (verbose)
@@ -1898,18 +1921,10 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s %s",
- escaped_label,
- estimatesize ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
- manifest_clause ? manifest_clause : "",
- manifest_checksums_clause);
+ if (use_new_option_syntax && buf.len > 0)
+ basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
+ else
+ basebkp = psprintf("BASE_BACKUP %s", buf.data);
if (PQsendQuery(conn, basebkp) == 0)
{
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..d782b81adc 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -603,6 +603,67 @@ DropReplicationSlot(PGconn *conn, const char *slot_name)
return true;
}
+/*
+ * Append a "plain" option - one with no value - to a server command that
+ * is being constructed.
+ *
+ * In the old syntax, all options were parser keywords, so you could just
+ * write things like SOME_COMMAND OPTION1 OPTION2 'opt2value' OPTION3 42. The
+ * new syntax uses a comma-separated list surrounded by parentheses, so the
+ * equivalent is SOME_COMMAND (OPTION1, OPTION2 'optvalue', OPTION3 42).
+ */
+void
+AppendPlainCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name)
+{
+ if (buf->len > 0 && buf->data[buf->len - 1] != '(')
+ {
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(buf, ", ");
+ else
+ appendPQExpBufferChar(buf, ' ');
+ }
+
+ appendPQExpBuffer(buf, " %s", option_name);
+}
+
+/*
+ * Append an option with an associated string value to a server command that
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendStringCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, char *option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ if (option_value != NULL)
+ {
+ size_t length = strlen(option_value);
+ char *escaped_value = palloc(1 + 2 * length);
+
+ PQescapeStringConn(conn, escaped_value, option_value, length, NULL);
+ appendPQExpBuffer(buf, " '%s'", escaped_value);
+ pfree(escaped_value);
+ }
+}
+
+/*
+ * Append an option with an associated integer value to a server command
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendIntegerCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, int32 option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ appendPQExpBuffer(buf, " %d", option_value);
+}
/*
* Frontend version of GetCurrentTimestamp(), since we are not linked with
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..65135c79e0 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -15,6 +15,7 @@
#include "access/xlogdefs.h"
#include "datatype/timestamp.h"
#include "libpq-fe.h"
+#include "pqexpbuffer.h"
extern const char *progname;
extern char *connection_string;
@@ -40,6 +41,17 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
TimeLineID *starttli,
XLogRecPtr *startpos,
char **db_name);
+
+extern void AppendPlainCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_value);
+extern void AppendStringCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, char *option_value);
+extern void AppendIntegerCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, int32 option_value);
+
extern bool RetrieveWalSegSize(PGconn *conn);
extern TimestampTz feGetCurrentTimestamp(void);
extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
--
2.24.3 (Apple Git-128)
v5-0008-WIP-Server-side-gzip-compression.patchapplication/octet-stream; name=v5-0008-WIP-Server-side-gzip-compression.patchDownload
From 12dd8b6e2b81c933861ea817f30d8796d98eb0cd Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 13 Sep 2021 12:07:01 -0400
Subject: [PATCH v5 8/8] WIP: Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
---
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 303 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 38 ++-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 382 insertions(+), 2 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 62f915e8b8..d6df3fdeb2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -292,6 +300,10 @@ perform_base_backup(basebackup_options *opt)
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt->compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt->compression_level);
+
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -740,11 +752,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str;
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -904,6 +918,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..3d2fa93e55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,303 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index c23cb2846f..38919fa6d9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -992,7 +993,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1001,14 +1004,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1731,6 +1752,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2141,6 +2173,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2320,6 +2353,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index c074da9313..f09aecb53b 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -263,6 +263,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v5-0007-Support-base-backup-targets.patchapplication/octet-stream; name=v5-0007-Support-base-backup-targets.patchDownload
From 878374f1bb1ddc9bf634943e542757b52bca2585 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 14:56:52 -0400
Subject: [PATCH v5 7/8] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 301 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 197 +++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 556 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index aefa7cb17e..62f915e8b8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -253,14 +256,38 @@ perform_base_backup(basebackup_options *opt)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt->target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt->target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt->target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt->target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -711,6 +738,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -846,25 +875,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -876,6 +915,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 389a520417..9104455700 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -127,11 +130,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -204,8 +208,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -286,8 +294,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..dff930c3c9
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,301 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index 1606463291..d1927e4f81 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -121,7 +121,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index ef7e6bfb77..a910915ccd 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8221a8c9ac..c23cb2846f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -109,7 +109,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -126,6 +126,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -357,6 +358,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1221,15 +1224,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1301,24 +1311,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1683,7 +1701,33 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1778,8 +1822,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1790,7 +1839,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1873,7 +1923,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2007,8 +2057,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2030,7 +2083,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2064,6 +2117,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2114,7 +2168,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2155,6 +2209,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2287,18 +2344,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2308,6 +2397,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2324,6 +2423,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2357,8 +2459,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2379,6 +2491,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2386,6 +2499,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2395,6 +2511,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2437,11 +2556,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 2047d0fa7a..c074da9313 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,9 +261,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6007827b44..6af924b6d4 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 54c67982f5..eb44604d40 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3770,6 +3770,7 @@ backup_target_type
bbsink
bbsink_copystream
bbsink_ops
+bbsink_server
bbsink_state
bbsink_throttle
bbstreamer
--
2.24.3 (Apple Git-128)
v5-0005-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v5-0005-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload
From 3c15a0ea5695e8c1cbd0ec696506f641d5b8a4f8 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 12:00:34 -0400
Subject: [PATCH v5 5/8] Introduce 'bbstreamer' abstraction to modularize
pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 +++++
src/bin/pg_basebackup/bbstreamer_file.c | 579 ++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 912 +++++-----------------
src/tools/pgindent/typedefs.list | 10 +
7 files changed, 1697 insertions(+), 727 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..8fda09dcd4 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -34,10 +34,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -60,7 +66,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..b24dc848c1
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content) (bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize) (bbstreamer *streamer);
+ void (*free) (bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..03e1ea2550
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,579 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include <unistd.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map) (const char *);
+ void (*report_output_file) (const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any
+ * symbolic link, and which should return a replacement pathname to be used
+ * in its place. If NULL, the symbolic link target is used without
+ * modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a
+ * new output file. The pathname to that file is passed as an argument. If
+ * NULL, the call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = pstrdup(basepath);
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6 clusters) will
+ * have been created by the wal receiver process. Also, when the WAL
+ * directory location was specified, pg_wal (or pg_xlog) has already
+ * been created as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ pfree(mystreamer->basepath);
+ pfree(mystreamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..4d15251fdc
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf; on
+ * older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..5a9f587dca
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,444 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+
+ /*
+ * If we're expecting an archive member header, accumulate a
+ * full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the file
+ * trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not the
+ * start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0c8be22558..947a182e86 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -28,18 +28,13 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
-#include "common/string.h"
#include "fe_utils/option_utils.h"
#include "fe_utils/recovery_gen.h"
-#include "fe_utils/string_utils.h"
#include "getopt_long.h"
-#include "libpq-fe.h"
-#include "pgtar.h"
-#include "pgtime.h"
-#include "pqexpbuffer.h"
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
@@ -62,34 +57,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -161,10 +131,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -190,14 +161,15 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force,
- bool finished);
-
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force, bool finished);
+
+static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported);
+static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -360,21 +332,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -763,6 +720,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -775,8 +740,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* is moved to the next line.
*/
static void
-progress_report(int tablespacenum, const char *filename,
- bool force, bool finished)
+progress_report(int tablespacenum, bool force, bool finished)
{
int percent;
char totaldone_str[32];
@@ -816,7 +780,7 @@ progress_report(int tablespacenum, const char *filename,
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -832,7 +796,7 @@ progress_report(int tablespacenum, const char *filename,
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -846,7 +810,7 @@ progress_report(int tablespacenum, const char *filename,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -992,257 +956,170 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
}
/*
- * Write a piece of tar data
+ * Figure out what to do with an archive received from the server based on
+ * the options selected by the user. We may just write the results directly
+ * to a file, or we might compress first, or we might extract the tar file
+ * and write each member separately. This function doesn't do any of that
+ * directly, but it works out what kind of bbstreamer we need to create so
+ * that the right stuff happens when, down the road, we actually receive
+ * the data.
*/
-static void
-writeTarData(WriteTarState *state, char *buf, int r)
+static bbstreamer *
+CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported)
{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-}
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer = NULL;
+ bool inject_manifest;
+ bool must_parse_archive;
-/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
- *
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
- */
-static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- char zerobuf[TAR_BLOCK_SIZE * 2];
- WriteTarState state;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
+ /*
+ * Normally, we emit the backup manifest as a separate file, but when
+ * we're writing a tarfile to stdout, we don't have that option, so
+ * include it in the one tarfile we've got.
+ */
+ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ /*
+ * We have to parse the archive if (1) we're suppose to extract it, or if
+ * (2) we need to inject backup_manifest or recovery configuration into it.
+ */
+ must_parse_archive = (format == 'p' || inject_manifest ||
+ (spclocation == NULL && writerecoveryconf));
- if (state.basetablespace)
+ if (format == 'p')
{
+ const char *directory;
+
/*
- * Base tablespaces
+ * In plain format, we must extract the archive. The data for the main
+ * tablespace will be written to the base directory, and the data for
+ * other tablespaces will be written to the directory where they're
+ * located on the server, after applying any user-specified tablespace
+ * mappings.
*/
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
-
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
- else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ directory = spclocation == NULL ? basedir
+ : get_tablespace_mapping(spclocation);
+ streamer = bbstreamer_extractor_new(directory,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
+ FILE *archive_file;
+ char archive_filename[MAXPGPATH];
+
/*
- * Specific tablespace
+ * In tar format, we just write the archive without extracting it.
+ * Normally, we write it to the archive name provided by the caller,
+ * but when the base directory is "-" that means we need to write
+ * to standard output.
*/
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(archive_filename, sizeof(archive_filename), "-");
+ archive_file = stdout;
}
else
-#endif
{
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
+ snprintf(archive_filename, sizeof(archive_filename),
+ "%s/%s", basedir, archive_name);
+ archive_file = NULL;
}
- }
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(archive_filename, ".gz", sizeof(archive_filename));
+ streamer = bbstreamer_gzip_writer_new(archive_filename,
+ archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+
+ /*
+ * If we need to parse the archive for whatever reason, then we'll
+ * also need to re-archive, because, if the output format is tar, the
+ * only point of parsing the archive is to be able to inject stuff
+ * into it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = archive_filename;
+ }
/*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
+ * If we're supposed to inject the backup manifest into the results,
+ * it should be done here, so that the file content can be injected
+ * directly, without worrying about the details of the tar format.
*/
+ if (inject_manifest)
+ manifest_inject_streamer = streamer;
- MemSet(zerobuf, 0, sizeof(zerobuf));
-
- if (state.basetablespace && writerecoveryconf)
+ /*
+ * If this is the main tablespace and we're supposed to write
+ * recovery information, arrange to do that.
+ */
+ if (spclocation == NULL && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ Assert(must_parse_archive);
+ streamer = bbstreamer_recovery_injector_new(streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ /*
+ * If we're doing anything that involves understanding the contents of
+ * the archive, we'll need to parse it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_parser_new(streamer);
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ /* Return the results. */
+ *manifest_inject_streamer_p = manifest_inject_streamer;
+ return streamer;
+}
- writeTarData(&state, header, sizeof(header));
+/*
+ * Receive raw tar data from the server, and stream it to the appropriate
+ * location. If we're writing a single tarfile to standard output, also
+ * receive the backup manifest and inject it into that tarfile.
+ */
+static void
+ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum)
+{
+ WriteTarState state;
+ bbstreamer *manifest_inject_streamer;
+ bool is_recovery_guc_supported;
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ /* Pass all COPY data through to the backup streamer. */
+ memset(&state, 0, sizeof(state));
+ is_recovery_guc_supported =
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ state.streamer = CreateBackupStreamer(archive_name, spclocation,
+ &manifest_inject_streamer,
+ is_recovery_guc_supported);
+ state.tablespacenum = tablespacenum;
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ progress_filename = NULL;
/*
- * Normally, we emit the backup manifest as a separate file, but when
- * we're writing a tarfile to stdout, we don't have that option, so
- * include it in the one tarfile we've got.
+ * The decision as to whether we need to inject the backup manifest into
+ * the output at this stage is made by CreateBackupStreamer; if that is
+ * needed, manifest_inject_streamer will be non-NULL; otherwise, it will
+ * be NULL.
*/
- if (strcmp(basedir, "-") == 0 && manifest)
+ if (manifest_inject_streamer != NULL)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
+ /* Slurp the entire backup manifest into a buffer. */
initPQExpBuffer(&buf);
ReceiveBackupManifestInMemory(conn, &buf);
if (PQExpBufferDataBroken(buf))
@@ -1250,42 +1127,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
- termPQExpBuffer(&buf);
- }
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
+ /* Inject it into the output tarfile. */
+ bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ buf.data, buf.len);
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
+ /* Free memory. */
+ termPQExpBuffer(&buf);
}
- progress_report(rownum, state.filename, true, false);
+ /* Cleanup. */
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+
+ progress_report(tablespacenum, true, false);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1301,184 +1156,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
-
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
+ bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
+ progress_report(state->tablespacenum, false, false);
}
@@ -1503,242 +1184,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true, false);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2031,16 +1476,32 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
+ /* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named base.tar
+ * if it's the main data directory or <tablespaceoid>.tar if it's for
+ * another tablespace. CreateBackupStreamer() will arrange to add .gz
+ * to the archive name if pg_basebackup is performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
+ }
/*
* Now receive backup manifest, if appropriate.
@@ -2056,7 +1517,10 @@ BaseBackup(void)
ReceiveBackupManifest(conn);
if (showprogress)
- progress_report(PQntuples(res), NULL, true, true);
+ {
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true, true);
+ }
PQclear(res);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49b119a6cb..b916f09165 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3769,3 +3769,13 @@ bbsink
bbsink_ops
bbsink_state
bbsink_throttle
+bbstreamer
+bbstreamer
+bbstreamer_archive_context
+bbstreamer_bzip_writer
+bbstreamer_member
+bbstreamer_ops
+bbstreamer_plain_writer
+bbstreamer_recovery_injector
+bbstreamer_tar_archiver
+bbstreamer_tar_parser
--
2.24.3 (Apple Git-128)
v5-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patchapplication/octet-stream; name=v5-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patchDownload
From f2b38f680683bc5317c7ba7140f77e8efcb9aa43 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 10 Sep 2021 11:50:05 -0400
Subject: [PATCH v5 2/8] Flexible options for CREATE_REPLICATION_SLOT.
Like BASE_BACKUP, CREATE_REPLICATION_SLOT has historically used a
hard-coded syntax. To improve future extensibility, adopt a flexible
options syntax here, too.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_receivewal and
pg_recvlogical use it.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
---
doc/src/sgml/protocol.sgml | 37 ++++++++++++-----
.../libpqwalreceiver/libpqwalreceiver.c | 16 ++++----
src/backend/replication/repl_gram.y | 35 +++++++++-------
src/backend/replication/walsender.c | 40 ++++++++++--------
src/bin/pg_basebackup/streamutil.c | 41 +++++++++++++++----
5 files changed, 110 insertions(+), 59 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 32d1eeabdc..31bf5a7ffd 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1914,7 +1914,7 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry id="protocol-replication-create-slot" xreflabel="CREATE_REPLICATION_SLOT">
- <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable> [ <literal>TEMPORARY</literal> ] { <literal>PHYSICAL</literal> [ <literal>RESERVE_WAL</literal> ] | <literal>LOGICAL</literal> <replaceable class="parameter">output_plugin</replaceable> [ <literal>EXPORT_SNAPSHOT</literal> | <literal>NOEXPORT_SNAPSHOT</literal> | <literal>USE_SNAPSHOT</literal> | <literal>TWO_PHASE</literal> ] }
+ <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable> [ <literal>TEMPORARY</literal> ] { <literal>PHYSICAL</literal> | <literal>LOGICAL</literal> } [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
<indexterm><primary>CREATE_REPLICATION_SLOT</primary></indexterm>
</term>
<listitem>
@@ -1954,46 +1954,50 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+ </variablelist>
+
+ <para>The following options are supported:</para>
+ <variablelist>
<varlistentry>
- <term><literal>TWO_PHASE</literal></term>
+ <term><literal>TWO_PHASE [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Specify that this logical replication slot supports decoding of two-phase
+ If true, this logical replication slot supports decoding of two-phase
transactions. With this option, two-phase commands like
<literal>PREPARE TRANSACTION</literal>, <literal>COMMIT PREPARED</literal>
and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
The transaction will be decoded and transmitted at
<literal>PREPARE TRANSACTION</literal> time.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>RESERVE_WAL</literal></term>
+ <term><literal>RESERVE_WAL [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Specify that this physical replication slot reserves <acronym>WAL</acronym>
+ If true, this physical replication slot reserves <acronym>WAL</acronym>
immediately. Otherwise, <acronym>WAL</acronym> is only reserved upon
connection from a streaming replication client.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>EXPORT_SNAPSHOT</literal></term>
- <term><literal>NOEXPORT_SNAPSHOT</literal></term>
- <term><literal>USE_SNAPSHOT</literal></term>
+ <term><literal>SNAPSHOT { 'export' | 'use' | 'nothing' }</literal></term>
<listitem>
<para>
Decides what to do with the snapshot created during logical slot
- initialization. <literal>EXPORT_SNAPSHOT</literal>, which is the default,
+ initialization. <literal>'export'</literal>, which is the default,
will export the snapshot for use in other sessions. This option can't
- be used inside a transaction. <literal>USE_SNAPSHOT</literal> will use the
+ be used inside a transaction. <literal>'use'</literal> will use the
snapshot for the current transaction executing the command. This
option must be used in a transaction, and
<literal>CREATE_REPLICATION_SLOT</literal> must be the first command
- run in that transaction. Finally, <literal>NOEXPORT_SNAPSHOT</literal> will
+ run in that transaction. Finally, <literal>'nothing'</literal> will
just use the snapshot for logical decoding as normal but won't do
anything else with it.
</para>
@@ -2052,6 +2056,17 @@ The commands accepted in replication mode are:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable> [ <literal>TEMPORARY</literal> ] { <literal>PHYSICAL</literal> [ <literal>RESERVE_WAL</literal> ] | <literal>LOGICAL</literal> <replaceable class="parameter">output_plugin</replaceable> [ <literal>EXPORT_SNAPSHOT</literal> | <literal>NOEXPORT_SNAPSHOT</literal> | <literal>USE_SNAPSHOT</literal> | <literal>TWO_PHASE</literal> ] }
+ </term>
+ <listitem>
+ <para>
+ For compatibility with older releases, this alternative syntax for
+ the <literal>CREATE_REPLICATION_SLOT</literal> command is still supported.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
<indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 19ea159af4..e3a783ebec 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -872,26 +872,28 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
if (conn->logical)
{
- appendStringInfoString(&cmd, " LOGICAL pgoutput");
- if (two_phase)
- appendStringInfoString(&cmd, " TWO_PHASE");
+ appendStringInfoString(&cmd, " LOGICAL pgoutput (");
switch (snapshot_action)
{
case CRS_EXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " EXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, "SNAPSHOT 'export'");
break;
case CRS_NOEXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " NOEXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, "SNAPSHOT 'nothing'");
break;
case CRS_USE_SNAPSHOT:
- appendStringInfoString(&cmd, " USE_SNAPSHOT");
+ appendStringInfoString(&cmd, "SNAPSHOT 'use'");
break;
}
+
+ if (two_phase)
+ appendStringInfoString(&cmd, ", TWO_PHASE");
+ appendStringInfoChar(&cmd, ')');
}
else
{
- appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+ appendStringInfoString(&cmd, " PHYSICAL (RESERVE_WAL)");
}
res = libpqrcv_PQexec(conn->streamConn, cmd.data);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index ce51a5e322..e5f66610c3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -103,8 +103,8 @@ static SQLCmd *make_sqlcmd(void);
%type <node> plugin_opt_arg
%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
-%type <list> create_slot_opt_list
-%type <defelt> create_slot_opt
+%type <list> create_slot_options create_slot_legacy_opt_list
+%type <defelt> create_slot_legacy_opt
%%
@@ -243,8 +243,8 @@ base_backup_legacy_opt:
;
create_replication_slot:
- /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
- K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL [options] */
+ K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -254,8 +254,8 @@ create_replication_slot:
cmd->options = $5;
$$ = (Node *) cmd;
}
- /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
- | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin [options] */
+ | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -268,28 +268,33 @@ create_replication_slot:
}
;
-create_slot_opt_list:
- create_slot_opt_list create_slot_opt
+create_slot_options:
+ '(' generic_option_list ')' { $$ = $2; }
+ | create_slot_legacy_opt_list { $$ = $1; }
+ ;
+
+create_slot_legacy_opt_list:
+ create_slot_legacy_opt_list create_slot_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-create_slot_opt:
+create_slot_legacy_opt:
K_EXPORT_SNAPSHOT
{
- $$ = makeDefElem("export_snapshot",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("snapshot",
+ (Node *)makeString("export"), -1);
}
| K_NOEXPORT_SNAPSHOT
{
- $$ = makeDefElem("export_snapshot",
- (Node *)makeInteger(false), -1);
+ $$ = makeDefElem("snapshot",
+ (Node *)makeString("nothing"), -1);
}
| K_USE_SNAPSHOT
{
- $$ = makeDefElem("use_snapshot",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("snapshot",
+ (Node *)makeString("use"), -1);
}
| K_RESERVE_WAL
{
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..b811a5c0ef 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -872,26 +872,30 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
{
DefElem *defel = (DefElem *) lfirst(lc);
- if (strcmp(defel->defname, "export_snapshot") == 0)
+ if (strcmp(defel->defname, "snapshot") == 0)
{
+ char *action;
+
if (snapshot_action_given || cmd->kind != REPLICATION_KIND_LOGICAL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
+ action = defGetString(defel);
snapshot_action_given = true;
- *snapshot_action = defGetBoolean(defel) ? CRS_EXPORT_SNAPSHOT :
- CRS_NOEXPORT_SNAPSHOT;
- }
- else if (strcmp(defel->defname, "use_snapshot") == 0)
- {
- if (snapshot_action_given || cmd->kind != REPLICATION_KIND_LOGICAL)
+
+ if (strcmp(action, "export") == 0)
+ *snapshot_action = CRS_EXPORT_SNAPSHOT;
+ else if (strcmp(action, "nothing") == 0)
+ *snapshot_action = CRS_NOEXPORT_SNAPSHOT;
+ else if (strcmp(action, "use") == 0)
+ *snapshot_action = CRS_USE_SNAPSHOT;
+ else
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("conflicting or redundant options")));
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized value for CREATE_REPLICATION_SLOT option \"%s\": \"%s\"",
+ defel->defname, action)));
- snapshot_action_given = true;
- *snapshot_action = CRS_USE_SNAPSHOT;
}
else if (strcmp(defel->defname, "reserve_wal") == 0)
{
@@ -901,7 +905,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
reserve_wal_given = true;
- *reserve_wal = true;
+ *reserve_wal = defGetBoolean(defel);
}
else if (strcmp(defel->defname, "two_phase") == 0)
{
@@ -910,7 +914,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
two_phase_given = true;
- *two_phase = true;
+ *two_phase = defGetBoolean(defel);
}
else
elog(ERROR, "unrecognized option: %s", defel->defname);
@@ -980,7 +984,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... EXPORT_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'export')")));
need_full_snapshot = true;
}
@@ -990,25 +994,25 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
if (XactIsoLevel != XACT_REPEATABLE_READ)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called in REPEATABLE READ isolation mode transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
if (FirstSnapshotSet)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called before any query",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
if (IsSubTransaction())
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called in a subtransaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
need_full_snapshot = true;
}
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index d782b81adc..72fda9f1d0 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -490,6 +490,7 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
{
PQExpBuffer query;
PGresult *res;
+ bool use_new_option_syntax = (PQserverVersion(conn) >= 150000);
query = createPQExpBuffer();
@@ -498,27 +499,51 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
Assert(!(two_phase && is_physical));
Assert(slot_name != NULL);
- /* Build query */
+ /* Build base portion of query */
appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
if (is_temporary)
appendPQExpBufferStr(query, " TEMPORARY");
if (is_physical)
- {
appendPQExpBufferStr(query, " PHYSICAL");
+ else
+ appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
+
+ /* Add any requested options */
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(query, " (");
+ if (is_physical)
+ {
if (reserve_wal)
- appendPQExpBufferStr(query, " RESERVE_WAL");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "RESERVE_WAL");
}
else
{
- appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
if (two_phase && PQserverVersion(conn) >= 150000)
- appendPQExpBufferStr(query, " TWO_PHASE");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "TWO_PHASE");
- if (PQserverVersion(conn) >= 100000)
- /* pg_recvlogical doesn't use an exported snapshot, so suppress */
- appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ /* pg_recvlogical doesn't use an exported snapshot, so suppress */
+ if (use_new_option_syntax)
+ AppendStringCommandOption(query, use_new_option_syntax,
+ "SNAPSHOT", "nothing");
+ else
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "NOEXPORT_SNAPSHOT");
+ }
+ if (use_new_option_syntax)
+ {
+ /* Suppress option list if it would be empty, otherwise terminate */
+ if (query->data[query->len - 1] == '(')
+ {
+ query->len -= 2;
+ query->data[query->len] = '\0';
+ }
+ else
+ appendPQExpBufferChar(query, ')');
}
+ /* Now run the query */
res = PQexec(conn, query->data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
--
2.24.3 (Apple Git-128)
v5-0003-Refactor-basebackup.c-s-_tarWriteDir-function.patchapplication/octet-stream; name=v5-0003-Refactor-basebackup.c-s-_tarWriteDir-function.patchDownload
From d864481035cdb9f6a791c353b7c13100f2b0d51f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v5 3/8] Refactor basebackup.c's _tarWriteDir() function.
Sometimes, we replace a symbolic link that we find in the data
directory with an actual directory within the tarfile that we
create. _tarWriteDir was responsible both for making this
substitution and also for writing the tar header for the
resulting directory into the tar file. Make it do only the first
of those things, and rename to convert_link_to_directory.
Substantially larger refactoring of this source file is planned,
but this little bit seemed to make sense to commit
independently.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b0b52d3b1a..7d1ddd2f9f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -71,8 +71,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1371,7 +1370,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1387,7 +1388,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1399,7 +1402,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1873,12 +1878,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1887,8 +1891,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.3 (Apple Git-128)
v5-0004-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v5-0004-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload
From 86a0158e5bf5b53c2f688a3c449205ad9831e15a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 11:45:50 -0400
Subject: [PATCH v5 4/8] Introduce 'bbsink' abstraction to modularize base
backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc', but in the future we might introduce
other options.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
---
src/backend/replication/Makefile | 4 +
src/backend/replication/backup_manifest.c | 28 +-
src/backend/replication/basebackup.c | 674 +++++-------------
src/backend/replication/basebackup_copy.c | 324 +++++++++
src/backend/replication/basebackup_progress.c | 250 +++++++
src/backend/replication/basebackup_sink.c | 115 +++
src/backend/replication/basebackup_throttle.c | 198 +++++
src/include/replication/backup_manifest.h | 5 +-
src/include/replication/basebackup_sink.h | 275 +++++++
src/tools/pgindent/typedefs.list | 4 +
10 files changed, 1363 insertions(+), 514 deletions(-)
create mode 100644 src/backend/replication/basebackup_copy.c
create mode 100644 src/backend/replication/basebackup_progress.c
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/backend/replication/basebackup_throttle.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..74b97cf126 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,10 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_copy.o \
+ basebackup_progress.o \
+ basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 04ca455ace..4fe11a3b5c 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -310,9 +311,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -352,38 +352,28 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
- *
- * We choose to read back the data from the temporary file in chunks of
- * size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
- * size, so it seems to make sense to match that value here.
+ * Send the backup manifest.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
- char manifestbuf[BLCKSZ];
size_t bytes_to_read;
size_t rc;
- bytes_to_read = Min(sizeof(manifestbuf),
+ bytes_to_read = Min(sink->bbs_buffer_length,
manifest->manifest_size - manifest_bytes_done);
- rc = BufFileRead(manifest->buffile, manifestbuf, bytes_to_read);
+ rc = BufFileRead(manifest->buffile, sink->bbs_buffer,
+ bytes_to_read);
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 7d1ddd2f9f..ecd32e8436 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -46,6 +43,16 @@
#include "utils/resowner.h"
#include "utils/timestamp.h"
+/*
+ * How much data do we want to send in one CopyData message? Note that
+ * this may also result in reading the underlying files in chunks of this
+ * size.
+ *
+ * NB: The buffer size is required to be a multiple of the system block
+ * size, so use that value instead if it's bigger than our preference.
+ */
+#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+
typedef struct
{
const char *label;
@@ -59,27 +66,25 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
+static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -90,46 +95,12 @@ static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
-/*
- * Size of each block sent into the tar stream for larger files.
- */
-#define TAR_SEND_SIZE 32768
-
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
-/* The starting XLOG position of the base backup. */
-static XLogRecPtr startptr;
-
/* Total number of checksum failures during base backup. */
static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -255,30 +226,29 @@ static const struct exclude_list_item noChecksumFiles[] = {
static void
perform_base_backup(basebackup_options *opt)
{
- TimeLineID starttli;
+ bbsink_state state;
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- List *tablespaces = NIL;
+ bbsink *sink = bbsink_copytblspc_new();
+ bbsink *progress_sink;
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ /* Initial backup state, insofar as we know it now. */
+ state.tablespaces = NIL;
+ state.tablespace_num = 0;
+ state.bytes_done = 0;
+ state.bytes_total = 0;
+ state.bytes_total_is_valid = false;
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -295,11 +265,11 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
- startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
- tblspc_map_file);
+ basebackup_progress_wait_checkpoint();
+ state.startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint,
+ &state.starttli,
+ labelfile, &state.tablespaces,
+ tblspc_map_file);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -312,7 +282,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -329,7 +298,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = -1;
- tablespaces = lappend(tablespaces, ti);
+ state.tablespaces = lappend(state.tablespaces, ti);
/*
* Calculate the total backup size by summing up the size of each
@@ -337,100 +306,53 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
- NULL);
+ tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
+ true, NULL, NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
+ state.bytes_total += tmp->size;
}
+ state.bytes_total_is_valid = true;
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
-
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, &state, SINK_BUFFER_LENGTH);
/* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
+ sendDir(sink, ".", 1, false, state.tablespaces,
+ sendtblspclinks, &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -438,32 +360,33 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ Assert(lnext(state.tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
+ bbsink_end_archive(sink);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -489,8 +412,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -501,7 +423,7 @@ perform_base_backup(basebackup_options *opt)
* shouldn't be such files, but if there are, there's little harm in
* including them.
*/
- XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLByteToSeg(state.startptr, startsegno, wal_segment_size);
XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
@@ -591,7 +513,6 @@ perform_base_backup(basebackup_options *opt)
{
char *walFileName = (char *) lfirst(lc);
int fd;
- char buf[TAR_SEND_SIZE];
size_t cnt;
pgoff_t len = 0;
@@ -630,22 +551,17 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
- while ((cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf),
+ while ((cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length,
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -674,7 +590,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +613,23 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
+ AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
+ endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -739,7 +655,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -951,155 +867,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", LSN_FORMAT_ARGS(ptr));
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
+ int bytes_done = 0,
len;
pg_checksum_context checksum_ctx;
@@ -1125,25 +901,23 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
- update_basebackup_progress(len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
+ elog(ERROR, "could not update checksum of file \"%s\"",
+ filename);
+
+ while (bytes_done < len)
{
- char buf[TAR_BLOCK_SIZE];
+ size_t remaining = len - bytes_done;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
+ memcpy(sink->bbs_buffer, content, nbytes);
+ bbsink_archive_contents(sink, nbytes);
+ bytes_done += nbytes;
}
- if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
- elog(ERROR, "could not update checksum of file \"%s\"",
- filename);
+ _tarWritePadding(sink, len);
AddFileToBackupManifest(manifest, NULL, filename, len,
(pg_time_t) statbuf.st_mtime, &checksum_ctx);
@@ -1157,7 +931,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1187,11 +961,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1210,8 +984,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1371,8 +1145,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1389,8 +1163,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1403,15 +1177,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1442,7 +1216,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1466,7 +1240,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1498,7 +1272,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1506,7 +1280,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1583,21 +1357,19 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
bool block_retry = false;
- char buf[TAR_SEND_SIZE];
uint16 checksum;
int checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
@@ -1618,7 +1390,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1659,9 +1431,11 @@ sendFile(const char *readfilename, const char *tarfilename,
*/
while (len < statbuf->st_size)
{
+ size_t remaining = statbuf->st_size - len;
+
/* Try to read some more data. */
- cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf), statbuf->st_size - len),
+ cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length, remaining),
len, readfilename, true);
/*
@@ -1678,7 +1452,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read a multiple of
* BLCKSZ bytes.
*/
- Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
if (verify_checksum && (cnt % BLCKSZ != 0))
{
@@ -1694,7 +1468,7 @@ sendFile(const char *readfilename, const char *tarfilename,
{
for (i = 0; i < cnt / BLCKSZ; i++)
{
- page = buf + BLCKSZ * i;
+ page = sink->bbs_buffer + BLCKSZ * i;
/*
* Only check pages which have not been modified since the
@@ -1704,7 +1478,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* this case. We also skip completely new pages, since they
* don't have a checksum yet.
*/
- if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+ if (!PageIsNew(page) && PageGetLSN(page) < sink->bbs_state->startptr)
{
checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
phdr = (PageHeader) page;
@@ -1726,7 +1500,8 @@ sendFile(const char *readfilename, const char *tarfilename,
/* Reread the failed block */
reread_cnt =
- basebackup_read_file(fd, buf + BLCKSZ * i,
+ basebackup_read_file(fd,
+ sink->bbs_buffer + BLCKSZ * i,
BLCKSZ, len + BLCKSZ * i,
readfilename,
false);
@@ -1773,34 +1548,29 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
/* Also feed it to the checksum machinery. */
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer, cnt) < 0)
elog(ERROR, "could not update checksum of base backup");
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
- if (len < statbuf->st_size)
+ while (len < statbuf->st_size)
{
- MemSet(buf, 0, sizeof(buf));
- while (len < statbuf->st_size)
- {
- cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
- elog(ERROR, "could not update checksum of base backup");
- update_basebackup_progress(cnt);
- len += cnt;
- throttle(cnt);
- }
+ size_t remaining = statbuf->st_size - len;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
+
+ MemSet(sink->bbs_buffer, 0, nbytes);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ nbytes) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ bbsink_archive_contents(sink, nbytes);
+ len += nbytes;
}
/*
@@ -1808,13 +1578,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
- }
+ _tarWritePadding(sink, len);
CloseTransientFile(fd);
@@ -1837,18 +1601,28 @@ sendFile(const char *readfilename, const char *tarfilename,
return true;
}
-
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[TAR_BLOCK_SIZE];
enum tarError rc;
if (!sizeonly)
{
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ /*
+ * As of this writing, the smallest supported block size is 1kB, which
+ * is twice TAR_BLOCK_SIZE. Since the buffer size is required to be a
+ * multiple of BLCKSZ, it should be safe to assume that the buffer is
+ * large enough to fit an entire tar block. We double-check by means of
+ * these assertions.
+ */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= BLCKSZ,
+ "BLCKSZ too small for tar block");
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ rc = tarCreateHeader(sink->bbs_buffer, filename, linktarget,
+ statbuf->st_size, statbuf->st_mode,
+ statbuf->st_uid, statbuf->st_gid,
statbuf->st_mtime);
switch (rc)
@@ -1870,134 +1644,48 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
- update_basebackup_progress(sizeof(h));
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
}
- return sizeof(h);
-}
-
-/*
- * If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
- */
-static void
-convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
-{
- /* If symlink, write it as a directory anyway */
-#ifndef WIN32
- if (S_ISLNK(statbuf->st_mode))
-#else
- if (pgwin32_is_junction(pathbuf))
-#endif
- statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
+ return TAR_BLOCK_SIZE;
}
/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
+ * Pad with zero bytes out to a multiple of TAR_BLOCK_SIZE.
*/
static void
-throttle(size_t increment)
+_tarWritePadding(bbsink *sink, int len)
{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
+ int pad = tarPaddingBytesRequired(len);
/*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
+ * As in _tarWriteHeader, it should be safe to assume that the buffer is
+ * large enough that we don't need to do this in multiple chunks.
*/
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+ Assert(pad <= TAR_BLOCK_SIZE);
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
-
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
+ if (pad > 0)
+ {
+ MemSet(sink->bbs_buffer, 0, pad);
+ bbsink_archive_contents(sink, pad);
}
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
}
/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
static void
-update_basebackup_progress(int64 delta)
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
+ /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+ if (S_ISLNK(statbuf->st_mode))
+#else
+ if (pgwin32_is_junction(pathbuf))
+#endif
+ statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
/*
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
new file mode 100644
index 0000000000..564f010188
--- /dev/null
+++ b/src/backend/replication/basebackup_copy.c
@@ -0,0 +1,324 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_copy.c
+ * send basebackup archives using one COPY OUT operation per
+ * tablespace, and an additional COPY OUT for the backup manifest
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_copy.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_copytblspc_begin_backup(bbsink *sink);
+static void bbsink_copytblspc_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copytblspc_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_archive(bbsink *sink);
+static void bbsink_copytblspc_begin_manifest(bbsink *sink);
+static void bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_manifest(bbsink *sink);
+static void bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendTablespaceList(List *tablespaces);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+
+const bbsink_ops bbsink_copytblspc_ops = {
+ .begin_backup = bbsink_copytblspc_begin_backup,
+ .begin_archive = bbsink_copytblspc_begin_archive,
+ .archive_contents = bbsink_copytblspc_archive_contents,
+ .end_archive = bbsink_copytblspc_end_archive,
+ .begin_manifest = bbsink_copytblspc_begin_manifest,
+ .manifest_contents = bbsink_copytblspc_manifest_contents,
+ .end_manifest = bbsink_copytblspc_end_manifest,
+ .end_backup = bbsink_copytblspc_end_backup
+};
+
+/*
+ * Create a new 'copytblspc' bbsink.
+ */
+bbsink *
+bbsink_copytblspc_new(void)
+{
+ bbsink *sink = palloc0(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_copytblspc_ops;
+
+ return sink;
+}
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_copytblspc_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ /* Create a suitable buffer. */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_copytblspc_archive_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", LSN_FORMAT_ARGS(ptr));
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a result set via libpq describing the tablespace list.
+ */
+static void
+SendTablespaceList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..79f4d9dea3
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress tracking, including but not
+ * limited to command progress reporting.
+ *
+ * This should be used even if the PROGRESS option to the replication
+ * command BASE_BACKUP is not specified. Without that option, we won't
+ * have tallied up the size of the files that are going to need to be
+ * backed up, but we can still report to the command progress reporting
+ * facility how much data we've processed.
+ *
+ * Moreover, we also use this as a convenient place to update certain
+ * fields of the bbsink_state. That work is accurately described as
+ * keeping track of our progress, but it's not just for introspection.
+ * We need those fields to be updated properly in order for base backups
+ * to work.
+ *
+ * This particular basebackup sink requires extra callbacks that most base
+ * backup sinks don't. Rather than cramming those into the interface, we just
+ * have a few extra functions here that basebackup.c can call. (We could put
+ * the logic directly into that file as it's fairly simple, but it seems
+ * cleaner to have everything related to progress reporting in one place.)
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+static void bbsink_progress_begin_backup(bbsink *sink);
+static void bbsink_progress_archive_contents(bbsink *sink, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress tracking functions and
+ * forwards data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink));
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_progress_ops;
+ sink->bbs_next = next;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of the
+ * backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL, -1);
+
+ return sink;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+
+ /*
+ * Report that we are now streaming database files as a base backup. Also
+ * advertise the number of tablespaces, and, if known, the estimated total
+ * backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ if (sink->bbs_state->bytes_total_is_valid)
+ val[1] = sink->bbs_state->bytes_total;
+ else
+ val[1] = -1;
+ val[2] = list_length(sink->bbs_state->tablespaces);
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ bbsink_forward_begin_backup(sink);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ /*
+ * We expect one archive per tablespace, so reaching the end of an archive
+ * also means reaching the end of a tablespace. (Some day we might have a
+ * reason to decouple these concepts.)
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (sink->bbs_state->tablespace_num < list_length(sink->bbs_state->tablespaces))
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ sink->bbs_state->tablespace_num + 1);
+
+ /* Delegate to next sink. */
+ bbsink_forward_end_archive(sink);
+
+ /*
+ * This is a convenient place to update the bbsink_state's notion of which
+ * is the current tablespace. Note that the bbsink_state object is shared
+ * across all bbsink objects involved, but we're the outermost one and
+ * this is the very last thing we do.
+ */
+ sink->bbs_state->tablespace_num++;
+}
+
+/*
+ * Handle progress tracking for new archive contents.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+
+ /* First update bbsink_state with # of bytes done. */
+ state->bytes_done += len;
+
+ /* Now forward to next sink. */
+ bbsink_forward_archive_contents(sink, len);
+
+ /* Prepare to set # of bytes done for command progress reporting. */
+ val[nparam++] = state->bytes_done;
+
+ /*
+ * We may also want to update # of total bytes, to avoid overflowing past
+ * 100% or the full size. This may make the total size number change as we
+ * approach the end of the backup (the estimate will always be wrong if
+ * WAL is included), but that's better than having the done column be
+ * bigger than the total.
+ */
+ if (state->bytes_total_is_valid && state->bytes_done > state->bytes_total)
+ val[nparam++] = state->bytes_done;
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+ Assert(state->tablespace_num >= list_length(state->tablespaces) - 1);
+ Assert(state->tablespace_num <= list_length(state->tablespaces));
+
+ /*
+ * We report having finished all tablespaces at this point, even if the
+ * archive for the main tablespace is still open, because what's going to
+ * be added is WAL files, not files that are really from the main
+ * tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = list_length(state->tablespaces);
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..14104f50e8
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,115 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+/*
+ * Forward begin_backup callback.
+ *
+ * Only use this implementation if you want the bbsink you're implementing to
+ * share a buffer with the succesor bbsink.
+ */
+void
+bbsink_forward_begin_backup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_state != NULL);
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+ sink->bbs_buffer = sink->bbs_next->bbs_buffer;
+}
+
+/*
+ * Forward begin_archive callback.
+ */
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+/*
+ * Forward archive_contents callback.
+ *
+ * Code that wants to use this should initalize its own bbs_buffer and
+ * bbs_buffer_length fields to the values from the successor sink. In cases
+ * where the buffer isn't shared, the data needs to be copied before forwarding
+ * the callback. We don't do try to do that here, because there's really no
+ * reason to have separately allocated buffers containing the same identical
+ * data.
+ */
+void
+bbsink_forward_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_archive_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_archive callback.
+ */
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * Forward begin_manifest callback.
+ */
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward manifest_contents callback.
+ *
+ * As with the archive_contents callback, it's expected that the buffer is
+ * shared.
+ */
+void
+bbsink_forward_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_manifest callback.
+ */
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward end_backup callback.
+ */
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..1606463291
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink);
+static void bbsink_throttle_archive_contents(bbsink *sink, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ bbsink_forward_begin_backup(sink);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 099108910c..16ed7eec9b 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,7 +47,8 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
extern void FreeBackupManifest(backup_manifest_info *manifest);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..3a2206d82f
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,275 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * Taking a base backup produces one archive per tablespace directory,
+ * plus a backup manifest unless that feature has been disabled. The
+ * goal of the backup process is to put those archives and that manifest
+ * someplace, possibly after postprocessing them in some way. A 'bbsink'
+ * is an object to which those archives, and the manifest if present,
+ * can be sent.
+ *
+ * In practice, there will be a chain of 'bbsink' objects rather than
+ * just one, with callbacks being forwarded from one to the next,
+ * possibly with modification. Each object is responsible for a
+ * single task e.g. command progress reporting, throttling, or
+ * communication with the client.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Overall backup state shared by all bbsink objects for a backup.
+ *
+ * Before calling bbstate_begin_backup, caller must initiate a bbsink_state
+ * object which will last for the lifetime of the backup, and must thereafter
+ * update it as required before each new call to a bbsink method. The bbsink
+ * will retain a pointer to the state object and will consult it to understand
+ * the progress of the backup.
+ *
+ * 'tablespaces' is a list of tablespaceinfo objects. It must be set before
+ * calling bbstate_begin_backup() and must not be modified thereafter.
+ *
+ * 'tablespace_num' is the index of the current tablespace within the list
+ * stored in 'tablespaces'.
+ *
+ * 'bytes_done' is the number of bytes read so far from $PGDATA.
+ *
+ * 'bytes_total' is the total number of bytes estimated to be present in
+ * $PGDATA, if we have estimated this.
+ *
+ * 'bytes_total_is_valid' is true if and only if a proper estimate has been
+ * stored into 'bytes_total'.
+ *
+ * 'startptr' and 'starttli' identify the point in the WAL stream at which
+ * the backup began. They must be set before calling bbstate_begin_backup()
+ * and must not be modified thereafter.
+ */
+typedef struct bbsink_state
+{
+ List *tablespaces;
+ int tablespace_num;
+ uint64 bytes_done;
+ uint64 bytes_total;
+ bool bytes_total_is_valid;
+ XLogRecPtr startptr;
+ TimeLineID starttli;
+} bbsink_state;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
+ *
+ * 'bbs_next' is a pointer to another bbsink to which this bbsink is
+ * forwarding some or all operations.
+ *
+ * 'bbs_state' is a pointer to the bbsink_state object for this backup.
+ * Every bbsink associated with this backup should point to the same
+ * underlying state object.
+ *
+ * In general it is expected that the values of these fields are set when
+ * a bbsink is created and that they do not change thereafter. It's OK
+ * to modify the data to which bbs_buffer or bbs_state point, but no changes
+ * should be made to the contents of this struct.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ char *bbs_buffer;
+ int bbs_buffer_length;
+ bbsink *bbs_next;
+ bbsink_state *bbs_state;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline
+ * functions rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /*
+ * This callback is invoked just once, at the very start of the backup.
+ * It must set bbs_buffer to point to a chunk of storage where at least
+ * bbs_buffer_length bytes of data can be written.
+ */
+ void (*begin_backup) (bbsink *sink);
+
+ /*
+ * For each archive transmitted to a bbsink, there will be one call to the
+ * begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ *
+ * Before invoking the archive_contents() callback, the caller should copy
+ * a number of bytes equal to what will be passed as len into bbs_buffer,
+ * but not more than bbs_buffer_length.
+ *
+ * It's generally good if the buffer is as full as possible before the
+ * archive_contents() callback is invoked, but it's not worth expending
+ * extra cycles to make sure it's absolutely 100% full.
+ */
+ void (*begin_archive) (bbsink *sink, const char *archive_name);
+ void (*archive_contents) (bbsink *sink, size_t len);
+ void (*end_archive) (bbsink *sink);
+
+ /*
+ * If a backup manifest is to be transmitted to a bbsink, there will be
+ * one call to the begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback. These calls will occur after all archives are transmitted.
+ *
+ * The rules for invoking the manifest_contents() callback are the same as
+ * for the archive_contents() callback above.
+ */
+ void (*begin_manifest) (bbsink *sink);
+ void (*manifest_contents) (bbsink *sink, size_t len);
+ void (*end_manifest) (bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup) (bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, bbsink_state *state, int buffer_length)
+{
+ Assert(sink != NULL);
+
+ Assert(buffer_length > 0);
+
+ sink->bbs_state = state;
+ sink->bbs_buffer_length = buffer_length;
+ sink->bbs_ops->begin_backup(sink);
+
+ Assert(sink->bbs_buffer != NULL);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /*
+ * The caller should make a reasonable attempt to fill the buffer before
+ * calling this function, so it shouldn't be completely empty. Nor should
+ * it be filled beyond capacity.
+ */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->archive_contents(sink, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /* See comments in bbsink_archive_contents. */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->manifest_contents(sink, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ Assert(sink->bbs_state->tablespace_num == list_length(sink->bbs_state->tablespaces));
+
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 423780652f..49b119a6cb 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3765,3 +3765,7 @@ yyscan_t
z_stream
z_streamp
zic_t
+bbsink
+bbsink_ops
+bbsink_state
+bbsink_throttle
--
2.24.3 (Apple Git-128)
On Mon, Sep 13, 2021 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems like nothing has been done about the issue reported in [1]
This one line change shall fix the issue,
Oops. Try this version.
Thanks, this version works fine.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hello
I found that in 0001 you propose to rename few options. Probably we could rename another option for clarify? I think FAST (it's about some bw limits?) and WAIT (wait for what? checkpoint?) option names are confusing.
Could we replace FAST with "CHECKPOINT [fast|spread]" and WAIT to WAIT_WAL_ARCHIVED? I think such names would be more descriptive.
- if (PQserverVersion(conn) >= 100000)
- /* pg_recvlogical doesn't use an exported snapshot, so suppress */
- appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ /* pg_recvlogical doesn't use an exported snapshot, so suppress */
+ if (use_new_option_syntax)
+ AppendStringCommandOption(query, use_new_option_syntax,
+ "SNAPSHOT", "nothing");
+ else
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "NOEXPORT_SNAPSHOT");
In 0002, it looks like condition for 9.x releases was lost?
Also my gcc version 8.3.0 is not happy with v5-0007-Support-base-backup-targets.patch and produces:
basebackup.c: In function ‘parse_basebackup_options’:
basebackup.c:970:7: error: ‘target_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
errmsg("target '%s' does not accept a target detail",
^~~~~~
regards, Sergei
Thanks for the newer set of the patches Robert!
I was wondering if we should change the bbs_buffer_length in bbsink to
be size_t instead of int, because that's what most of the compression
libraries have their length variables defined as.
Regards,
Jeevan Ladhe
On Mon, Sep 13, 2021 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems like nothing has been done about the issue reported in [1]
This one line change shall fix the issue,
Oops. Try this version.
--
Robert Haas
EDB: http://www.enterprisedb.com
+ /* + * LZ4F_compressUpdate() returns the number of bytes written intooutput
+ * buffer. We need to keep track of how many bytes have been
cumulatively
+ * written into the output buffer(bytes_written). But, + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not + * written to output buffer, set autoFlush to 1 to force the writingto the
+ * output buffer. + */ + prefs->autoFlush = 1;I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?This is needed to know the actual bytes written in the output buffer. If
it is
set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
bytes are written to the output buffer, depending on whether it hasbuffered
or really flushed data to the output buffer.
The problem is that if we autoflush, I think it will cause the
compression ratio to be less good. Try un-lz4ing a file that is
produced this way and then re-lz4 it and compare the size of the
re-lz4'd file to the original one. Compressors rely on postponing
decisions about how to compress until they've seen as much of the
input as possible, and flushing forces them to decide earlier, and
maybe making a decision that isn't as good as it could have been. So I
believe we should look for a way of avoiding this. Now I realize
there's a problem there with doing that and also making sure the
output buffer is large enough, and I'm not quite sure how we solve
that problem, but there is probably a way to do it.
Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressed file
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to really advance
the
bytes written to the output buffer so that we can write into the output
buffer.
Regards,
Jeevan Ladhe
Hi Robert,
Here is a patch for lz4 based on the v5 set of patches. The patch adapts
with the
bbsink changes, and is now able to make the provision for the required
length
for the output buffer using the new callback
function bbsink_lz4_begin_backup().
Sample command to take backup:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4
Please let me know your thoughts.
Regards,
Jeevan Ladhe
On Mon, Sep 13, 2021 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Seems like nothing has been done about the issue reported in [1]
This one line change shall fix the issue,
Oops. Try this version.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
lz4_compress_v2.patchapplication/octet-stream; name=lz4_compress_v2.patchDownload
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d6df3fdeb2..6e804c0d74 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -303,6 +304,8 @@ perform_base_backup(basebackup_options *opt)
/* Set up server-side compression, if client requested it */
if (opt->compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt->compression_level);
+ if (opt->compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -936,6 +939,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..12cd33a196
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,303 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to
+ * the lz4 compression. Defined as 8k, since the input buffer is multiple of
+ * BLCKSZ i.e. multiple of 8k.
+ */
+#define CHUNK_SIZE 8192
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+ size_t output_buffer_bound;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t next_buf_len;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Remember the compressed buffer bound needed for input buffer to avoid
+ * recomputation in bbsink_lz4_archive_contents().
+ */
+ mysink->output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accomodate the compressed input
+ * buffer.
+ */
+ next_buf_len = mysink->base.bbs_buffer_length + mysink->output_buffer_bound;
+
+ /*
+ * Make it next multiple of BLCKSZ since the buffer length is expected so.
+ */
+ next_buf_len = next_buf_len + BLCKSZ - (next_buf_len % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, next_buf_len);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+ size_t headerSize;
+
+ /* Initialize compressor object. */
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->frameInfo.blockMode = LZ4F_blockLinked;
+ prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum;
+ prefs->frameInfo.frameType = LZ4F_frame;
+ prefs->frameInfo.contentSize = 0;
+ prefs->frameInfo.dictID = 0;
+ prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum;
+ prefs->compressionLevel = 0;
+
+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;
+
+ prefs->favorDecSpeed = 0;
+ prefs->reserved[0] = 0;
+ prefs->reserved[1] = 0;
+ prefs->reserved[2] = 0;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ CHUNK_SIZE,
+ prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ uint8 *next_in = (uint8 *) mysink->base.bbs_buffer;
+
+ while (avail_in > 0)
+ {
+ size_t compressedSize;
+ int nextChunkLen = CHUNK_SIZE;
+
+ /* Last chunk to be read from the input. */
+ if (avail_in < CHUNK_SIZE)
+ nextChunkLen = avail_in;
+
+ /*
+ * Read the nextChunkLen size of data from the input buffer and write the
+ * output data into unused portion of output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ next_in,
+ nextChunkLen,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Advance the input start since we already read some data. */
+ next_in = (uint8 *) next_in + nextChunkLen;
+ avail_in = avail_in - nextChunkLen;
+
+ /*
+ * If we are falling short of available bytes needed by
+ * LZ4F_compressUpdate() per the upper bound that is decided by
+ * LZ4F_compressBound(), send the archived contents to the next sink to
+ * process it further.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length -
+ mysink->bytes_written) < mysink->output_buffer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * Finalize the lz4 frame and then get that forwarded to the successor sink
+ * as archive content. Then, we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index f09aecb53b..84dc305d56 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -264,6 +264,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
On Tue, Sep 14, 2021 at 11:30 AM Sergei Kornilov <sk@zsrv.org> wrote:
I found that in 0001 you propose to rename few options. Probably we could rename another option for clarify? I think FAST (it's about some bw limits?) and WAIT (wait for what? checkpoint?) option names are confusing.
Could we replace FAST with "CHECKPOINT [fast|spread]" and WAIT to WAIT_WAL_ARCHIVED? I think such names would be more descriptive.
I think CHECKPOINT { 'spread' | 'fast' } is probably a good idea; the
options logic for pg_basebackup uses the same convention, and if
somebody ever wanted to introduce a third kind of checkpoint, it would
be a lot easier if you could just make pg_basebackup -cbanana send
CHECKPOINT 'banana' to the server. I don't think renaming WAIT ->
WAIT_WAL_ARCHIVED has much value. The replication grammar isn't really
intended to be consumed directly by end-users, and it's also not clear
that WAIT_WAL_ARCHIVED would attract more support than any of 5 or 10
other possible variants. I'd rather leave it alone.
- if (PQserverVersion(conn) >= 100000) - /* pg_recvlogical doesn't use an exported snapshot, so suppress */ - appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT"); + /* pg_recvlogical doesn't use an exported snapshot, so suppress */ + if (use_new_option_syntax) + AppendStringCommandOption(query, use_new_option_syntax, + "SNAPSHOT", "nothing"); + else + AppendPlainCommandOption(query, use_new_option_syntax, + "NOEXPORT_SNAPSHOT");In 0002, it looks like condition for 9.x releases was lost?
Good catch, thanks.
I'll post an updated version of these two patches on the thread
dedicated to those two patches, which can be found at
/messages/by-id/CA+Tgmob2cbCPNbqGoixp0J6aib0p00XZerswGZwx-5G=0M+BMA@mail.gmail.com
Also my gcc version 8.3.0 is not happy with v5-0007-Support-base-backup-targets.patch and produces:
basebackup.c: In function ‘parse_basebackup_options’:
basebackup.c:970:7: error: ‘target_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
errmsg("target '%s' does not accept a target detail",
^~~~~~
OK, I'll fix that. Thanks.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 21, 2021 at 7:54 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
I was wondering if we should change the bbs_buffer_length in bbsink to
be size_t instead of int, because that's what most of the compression
libraries have their length variables defined as.
I looked into this and found that I was already using size_t or Size
in a bunch of related places, so this seems to make sense.
Here's a new patch set, responding also to Sergei's comments.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v6-0006-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v6-0006-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From b3549ab1fc183b34f46be862e582599940f8f617 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 9 Sep 2021 14:53:04 -0400
Subject: [PATCH v6 6/8] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to suppor the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 62 ++-
src/backend/replication/basebackup_copy.c | 266 ++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
src/tools/pgindent/typedefs.list | 3 +
5 files changed, 722 insertions(+), 53 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0cd118f1f1..7fb7b1cf66 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -81,6 +88,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -233,7 +241,7 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- bbsink *sink = bbsink_copytblspc_new();
+ bbsink *sink;
bbsink *progress_sink;
/* Initial backup state, insofar as we know it now. */
@@ -243,6 +251,16 @@ perform_base_backup(basebackup_options *opt)
state.bytes_total = 0;
state.bytes_total_is_valid = false;
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -383,7 +401,10 @@ perform_base_backup(basebackup_options *opt)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(progress_sink);
@@ -621,6 +642,7 @@ perform_base_backup(basebackup_options *opt)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -688,8 +710,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -830,6 +854,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -1682,6 +1722,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 564f010188..389a520417 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,51 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -37,6 +101,17 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -48,6 +123,193 @@ const bbsink_ops bbsink_copytblspc_ops = {
.end_backup = bbsink_copytblspc_end_backup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins
+ * with a the type byte we're going to need, and then arrange things so
+ * that the data we're given will be written just after that type byte.
+ * That will allow us to ship the data with a single call to pq_putmessage
+ * and without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 67d01d8b6e..0a9eb8ca7e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -167,6 +177,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -978,10 +995,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1008,8 +1026,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1049,16 +1067,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1069,8 +1087,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1080,6 +1098,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1333,28 +1662,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1477,46 +1810,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 41c9c367f7..31a6d2251c 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,6 +261,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2f50dc4de1..6a9e469b9d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3766,7 +3766,10 @@ yyscan_t
z_stream
z_streamp
zic_t
+ArchiveStreamState
+backup_target_type
bbsink
+bbsink_copystream
bbsink_ops
bbsink_state
bbsink_throttle
--
2.24.3 (Apple Git-128)
v6-0007-Support-base-backup-targets.patchapplication/octet-stream; name=v6-0007-Support-base-backup-targets.patchDownload
From 88675389c5877439f01016b9959b3f03d78a867c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 14:56:52 -0400
Subject: [PATCH v6 7/8] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 301 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 197 +++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 556 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 7fb7b1cf66..ed16c6861f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -253,14 +256,38 @@ perform_base_backup(basebackup_options *opt)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt->target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt->target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt->target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt->target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -711,6 +738,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -856,25 +885,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -886,6 +925,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 389a520417..9104455700 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -127,11 +130,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -204,8 +208,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -286,8 +294,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..dff930c3c9
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,301 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index 1606463291..d1927e4f81 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -121,7 +121,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index ef7e6bfb77..a910915ccd 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0a9eb8ca7e..f9e91acff1 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -109,7 +109,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -126,6 +126,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -357,6 +358,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1216,15 +1219,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1296,24 +1306,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1684,7 +1702,33 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1779,8 +1823,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1791,7 +1840,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1874,7 +1924,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2008,8 +2058,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2031,7 +2084,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2065,6 +2118,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2115,7 +2169,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2156,6 +2210,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2288,18 +2345,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2309,6 +2398,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2325,6 +2424,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2358,8 +2460,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2380,6 +2492,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2387,6 +2500,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2396,6 +2512,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2438,11 +2557,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 31a6d2251c..7365b39e23 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,9 +261,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6007827b44..6af924b6d4 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6a9e469b9d..04b2830203 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3771,6 +3771,7 @@ backup_target_type
bbsink
bbsink_copystream
bbsink_ops
+bbsink_server
bbsink_state
bbsink_throttle
bbstreamer
--
2.24.3 (Apple Git-128)
v6-0001-Flexible-options-for-BASE_BACKUP.patchapplication/octet-stream; name=v6-0001-Flexible-options-for-BASE_BACKUP.patchDownload
From 3172200ac7ff934f8add6e46fb2a5fe2e741d957 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 21 Sep 2021 12:22:07 -0400
Subject: [PATCH v6 1/8] Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed by Fabien Coelho and Sergei Kornilov.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
---
doc/src/sgml/protocol.sgml | 68 +++++++++++--------
src/backend/replication/basebackup.c | 51 ++++++++------
src/backend/replication/repl_gram.y | 97 +++++++++++++++++++++++----
src/bin/pg_basebackup/pg_basebackup.c | 71 +++++++++++++-------
src/bin/pg_basebackup/streamutil.c | 61 +++++++++++++++++
src/bin/pg_basebackup/streamutil.h | 12 ++++
6 files changed, 276 insertions(+), 84 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..a5c07bfefd 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2517,8 +2517,7 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
- <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [ <literal>MANIFEST_CHECKSUMS</literal> <replaceable>checksum_algorithm</replaceable> ]
- <indexterm><primary>BASE_BACKUP</primary></indexterm>
+ <term><literal>BASE_BACKUP</literal> [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
</term>
<listitem>
<para>
@@ -2540,52 +2539,55 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>PROGRESS</literal></term>
+ <term><literal>PROGRESS [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Request information required to generate a progress report. This will
- send back an approximate size in the header of each tablespace, which
- can be used to calculate how far along the stream is done. This is
- calculated by enumerating all the file sizes once before the transfer
- is even started, and might as such have a negative impact on the
- performance. In particular, it might take longer before the first data
+ If set to true, request information required to generate a progress
+ report. This will send back an approximate size in the header of each
+ tablespace, which can be used to calculate how far along the stream
+ is done. This is calculated by enumerating all the file sizes once
+ before the transfer is even started, and might as such have a
+ negative impact on the performance. In particular, it might take
+ longer before the first data
is streamed. Since the database files can change during the backup,
the size is only approximate and might both grow and shrink between
the time of approximation and the sending of the actual files.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>FAST</literal></term>
+ <term><literal>CHECKPOINT { 'fast' | 'spread' }</replaceable></literal></term>
<listitem>
<para>
- Request a fast checkpoint.
+ Sets the type of checkpoint to be performed at the beginning of the
+ base backup. The default is <literal>spread</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>WAL</literal></term>
+ <term><literal>WAL [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Include the necessary WAL segments in the backup. This will include
- all the files between start and stop backup in the
+ If set to true, include the necessary WAL segments in the backup.
+ This will include all the files between start and stop backup in the
<filename>pg_wal</filename> directory of the base directory tar
- file.
+ file. The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>NOWAIT</literal></term>
+ <term><literal>WAIT [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- By default, the backup will wait until the last required WAL
+ If set to true, the backup will wait until the last required WAL
segment has been archived, or emit a warning if log archiving is
- not enabled. Specifying <literal>NOWAIT</literal> disables both
- the waiting and the warning, leaving the client responsible for
- ensuring the required log is available.
+ not enabled. If false, the backup will neither wait nor warn,
+ leaving the client responsible for ensuring the required log is
+ available. The default is true.
</para>
</listitem>
</varlistentry>
@@ -2605,25 +2607,25 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>TABLESPACE_MAP</literal></term>
+ <term><literal>TABLESPACE_MAP [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Include information about symbolic links present in the directory
- <filename>pg_tblspc</filename> in a file named
+ If true, include information about symbolic links present in the
+ directory <filename>pg_tblspc</filename> in a file named
<filename>tablespace_map</filename>. The tablespace map file includes
each symbolic link name as it exists in the directory
<filename>pg_tblspc/</filename> and the full path of that symbolic link.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+ <term><literal>VERIFY_CHECKSUMS [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- By default, checksums are verified during a base backup if they are
- enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
- this verification.
+ If true, checksums are verified during a base backup if they are
+ enabled. If false, this is skipped. The default is true.
</para>
</listitem>
</varlistentry>
@@ -2708,6 +2710,7 @@ The commands accepted in replication mode are:
</varlistentry>
</variablelist>
</para>
+
<para>
After the second regular result set, one or more CopyOutResponse results
will be sent, one for the main data directory and one for each additional tablespace other
@@ -2788,6 +2791,17 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [ <literal>MANIFEST_CHECKSUMS</literal> <replaceable>checksum_algorithm</replaceable> ]
+ </term>
+ <listitem>
+ <para>
+ For compatibility with older releases, this alternative syntax for
+ the <literal>BASE_BACKUP</literal> command is still supported.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e09108d0ec..4c97ab7b5a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "catalog/pg_type.h"
#include "common/file_perm.h"
+#include "commands/defrem.h"
#include "commands/progress.h"
#include "lib/stringinfo.h"
#include "libpq/libpq.h"
@@ -764,7 +765,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ListCell *lopt;
bool o_label = false;
bool o_progress = false;
- bool o_fast = false;
+ bool o_checkpoint = false;
bool o_nowait = false;
bool o_wal = false;
bool o_maxrate = false;
@@ -787,7 +788,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->label = strVal(defel->arg);
+ opt->label = defGetString(defel);
o_label = true;
}
else if (strcmp(defel->defname, "progress") == 0)
@@ -796,25 +797,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->progress = true;
+ opt->progress = defGetBoolean(defel);
o_progress = true;
}
- else if (strcmp(defel->defname, "fast") == 0)
+ else if (strcmp(defel->defname, "checkpoint") == 0)
{
- if (o_fast)
+ char *optval = defGetString(defel);
+
+ if (o_checkpoint)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->fastcheckpoint = true;
- o_fast = true;
+ if (pg_strcasecmp(optval, "fast") == 0)
+ opt->fastcheckpoint = true;
+ else if (pg_strcasecmp(optval, "spread") == 0)
+ opt->fastcheckpoint = false;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized checkpoint type: \"%s\"",
+ optval)));
+ o_checkpoint = true;
}
- else if (strcmp(defel->defname, "nowait") == 0)
+ else if (strcmp(defel->defname, "wait") == 0)
{
if (o_nowait)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->nowait = true;
+ opt->nowait = !defGetBoolean(defel);
o_nowait = true;
}
else if (strcmp(defel->defname, "wal") == 0)
@@ -823,19 +834,19 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->includewal = true;
+ opt->includewal = defGetBoolean(defel);
o_wal = true;
}
else if (strcmp(defel->defname, "max_rate") == 0)
{
- long maxrate;
+ int64 maxrate;
if (o_maxrate)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- maxrate = intVal(defel->arg);
+ maxrate = defGetInt64(defel);
if (maxrate < MAX_RATE_LOWER || maxrate > MAX_RATE_UPPER)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
@@ -851,21 +862,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->sendtblspcmapfile = true;
+ opt->sendtblspcmapfile = defGetBoolean(defel);
o_tablespace_map = true;
}
- else if (strcmp(defel->defname, "noverify_checksums") == 0)
+ else if (strcmp(defel->defname, "verify_checksums") == 0)
{
if (o_noverify_checksums)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- noverify_checksums = true;
+ noverify_checksums = !defGetBoolean(defel);
o_noverify_checksums = true;
}
else if (strcmp(defel->defname, "manifest") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
bool manifest_bool;
if (o_manifest)
@@ -890,7 +901,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "manifest_checksums") == 0)
{
- char *optval = strVal(defel->arg);
+ char *optval = defGetString(defel);
if (o_manifest_checksums)
ereport(ERROR,
@@ -905,8 +916,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
o_manifest_checksums = true;
}
else
- elog(ERROR, "option \"%s\" not recognized",
- defel->defname);
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname));
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..3b59d62ed8 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -95,13 +95,13 @@ static SQLCmd *make_sqlcmd(void);
%type <node> base_backup start_replication start_logical_replication
create_replication_slot drop_replication_slot identify_system
timeline_history show sql_cmd
-%type <list> base_backup_opt_list
-%type <defelt> base_backup_opt
+%type <list> base_backup_legacy_opt_list generic_option_list
+%type <defelt> base_backup_legacy_opt generic_option
%type <uintval> opt_timeline
%type <list> plugin_options plugin_opt_list
%type <defelt> plugin_opt_elem
%type <node> plugin_opt_arg
-%type <str> opt_slot var_name
+%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
%type <list> create_slot_opt_list
%type <defelt> create_slot_opt
@@ -157,12 +157,24 @@ var_name: IDENT { $$ = $1; }
;
/*
+ * BASE_BACKUP ( option [ 'value' ] [, ...] )
+ *
+ * We also still support the legacy syntax:
+ *
* BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
* [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
* [MANIFEST %s] [MANIFEST_CHECKSUMS %s]
+ *
+ * Future options should be supported only using the new syntax.
*/
base_backup:
- K_BASE_BACKUP base_backup_opt_list
+ K_BASE_BACKUP '(' generic_option_list ')'
+ {
+ BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
+ cmd->options = $3;
+ $$ = (Node *) cmd;
+ }
+ | K_BASE_BACKUP base_backup_legacy_opt_list
{
BaseBackupCmd *cmd = makeNode(BaseBackupCmd);
cmd->options = $2;
@@ -170,14 +182,14 @@ base_backup:
}
;
-base_backup_opt_list:
- base_backup_opt_list base_backup_opt
+base_backup_legacy_opt_list:
+ base_backup_legacy_opt_list base_backup_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-base_backup_opt:
+base_backup_legacy_opt:
K_LABEL SCONST
{
$$ = makeDefElem("label",
@@ -190,8 +202,8 @@ base_backup_opt:
}
| K_FAST
{
- $$ = makeDefElem("fast",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("checkpoint",
+ (Node *)makeString("fast"), -1);
}
| K_WAL
{
@@ -200,8 +212,8 @@ base_backup_opt:
}
| K_NOWAIT
{
- $$ = makeDefElem("nowait",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("wait",
+ (Node *)makeInteger(false), -1);
}
| K_MAX_RATE UCONST
{
@@ -215,8 +227,8 @@ base_backup_opt:
}
| K_NOVERIFY_CHECKSUMS
{
- $$ = makeDefElem("noverify_checksums",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("verify_checksums",
+ (Node *)makeInteger(false), -1);
}
| K_MANIFEST SCONST
{
@@ -422,6 +434,65 @@ plugin_opt_arg:
sql_cmd:
IDENT { $$ = (Node *) make_sqlcmd(); }
;
+
+generic_option_list:
+ generic_option_list ',' generic_option
+ { $$ = lappend($1, $3); }
+ | generic_option
+ { $$ = list_make1($1); }
+ ;
+
+generic_option:
+ ident_or_keyword
+ {
+ $$ = makeDefElem($1, NULL, -1);
+ }
+ | ident_or_keyword IDENT
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword SCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeString($2), -1);
+ }
+ | ident_or_keyword UCONST
+ {
+ $$ = makeDefElem($1, (Node *) makeInteger($2), -1);
+ }
+ ;
+
+ident_or_keyword:
+ IDENT { $$ = $1; }
+ | K_BASE_BACKUP { $$ = "base_backup"; }
+ | K_IDENTIFY_SYSTEM { $$ = "identify_system"; }
+ | K_SHOW { $$ = "show"; }
+ | K_START_REPLICATION { $$ = "start_replication"; }
+ | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
+ | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
+ | K_TIMELINE_HISTORY { $$ = "timeline_history"; }
+ | K_LABEL { $$ = "label"; }
+ | K_PROGRESS { $$ = "progress"; }
+ | K_FAST { $$ = "fast"; }
+ | K_WAIT { $$ = "wait"; }
+ | K_NOWAIT { $$ = "nowait"; }
+ | K_MAX_RATE { $$ = "max_rate"; }
+ | K_WAL { $$ = "wal"; }
+ | K_TABLESPACE_MAP { $$ = "tablespace_map"; }
+ | K_NOVERIFY_CHECKSUMS { $$ = "noverify_checksums"; }
+ | K_TIMELINE { $$ = "timeline"; }
+ | K_PHYSICAL { $$ = "physical"; }
+ | K_LOGICAL { $$ = "logical"; }
+ | K_SLOT { $$ = "slot"; }
+ | K_RESERVE_WAL { $$ = "reserve_wal"; }
+ | K_TEMPORARY { $$ = "temporary"; }
+ | K_TWO_PHASE { $$ = "two_phase"; }
+ | K_EXPORT_SNAPSHOT { $$ = "export_snapshot"; }
+ | K_NOEXPORT_SNAPSHOT { $$ = "noexport_snapshot"; }
+ | K_USE_SNAPSHOT { $$ = "use_snapshot"; }
+ | K_MANIFEST { $$ = "manifest"; }
+ | K_MANIFEST_CHECKSUMS { $$ = "manifest_checksums"; }
+ ;
+
%%
static SQLCmd *
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 669aa207a3..27ee6394cf 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1804,10 +1804,6 @@ BaseBackup(void)
TimeLineID latesttli;
TimeLineID starttli;
char *basebkp;
- char escaped_label[MAXPGPATH];
- char *maxrate_clause = NULL;
- char *manifest_clause = NULL;
- char *manifest_checksums_clause = "";
int i;
char xlogstart[64];
char xlogend[64];
@@ -1816,8 +1812,11 @@ BaseBackup(void)
int serverVersion,
serverMajor;
int writing_to_stdout;
+ bool use_new_option_syntax = false;
+ PQExpBufferData buf;
Assert(conn != NULL);
+ initPQExpBuffer(&buf);
/*
* Check server version. BASE_BACKUP command was introduced in 9.1, so we
@@ -1835,6 +1834,8 @@ BaseBackup(void)
serverver ? serverver : "'unknown'");
exit(1);
}
+ if (serverMajor >= 1500)
+ use_new_option_syntax = true;
/*
* If WAL streaming was requested, also check that the server is new
@@ -1865,20 +1866,48 @@ BaseBackup(void)
/*
* Start the actual backup
*/
- PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);
-
+ AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
+ if (estimatesize)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");
+ if (includewal == FETCH_WAL)
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");
+ if (fastcheckpoint)
+ {
+ if (use_new_option_syntax)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "CHECKPOINT", "fast");
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "FAST");
+ }
+ if (includewal != NO_WAL)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "WAIT", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "NOWAIT");
+ }
if (maxrate > 0)
- maxrate_clause = psprintf("MAX_RATE %u", maxrate);
+ AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
+ maxrate);
+ if (format == 't')
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+ if (!verify_checksums)
+ {
+ if (use_new_option_syntax)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax,
+ "VERIFY_CHECKSUMS", 0);
+ else
+ AppendPlainCommandOption(&buf, use_new_option_syntax,
+ "NOVERIFY_CHECKSUMS");
+ }
if (manifest)
{
- if (manifest_force_encode)
- manifest_clause = "MANIFEST 'force-encode'";
- else
- manifest_clause = "MANIFEST 'yes'";
+ AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
- manifest_checksums_clause = psprintf("MANIFEST_CHECKSUMS '%s'",
- manifest_checksums);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
if (verbose)
@@ -1893,18 +1922,10 @@ BaseBackup(void)
fprintf(stderr, "\n");
}
- basebkp =
- psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s %s",
- escaped_label,
- estimatesize ? "PROGRESS" : "",
- includewal == FETCH_WAL ? "WAL" : "",
- fastcheckpoint ? "FAST" : "",
- includewal == NO_WAL ? "" : "NOWAIT",
- maxrate_clause ? maxrate_clause : "",
- format == 't' ? "TABLESPACE_MAP" : "",
- verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
- manifest_clause ? manifest_clause : "",
- manifest_checksums_clause);
+ if (use_new_option_syntax && buf.len > 0)
+ basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
+ else
+ basebkp = psprintf("BASE_BACKUP %s", buf.data);
if (PQsendQuery(conn, basebkp) == 0)
{
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index f5b3b476e5..d782b81adc 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -603,6 +603,67 @@ DropReplicationSlot(PGconn *conn, const char *slot_name)
return true;
}
+/*
+ * Append a "plain" option - one with no value - to a server command that
+ * is being constructed.
+ *
+ * In the old syntax, all options were parser keywords, so you could just
+ * write things like SOME_COMMAND OPTION1 OPTION2 'opt2value' OPTION3 42. The
+ * new syntax uses a comma-separated list surrounded by parentheses, so the
+ * equivalent is SOME_COMMAND (OPTION1, OPTION2 'optvalue', OPTION3 42).
+ */
+void
+AppendPlainCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name)
+{
+ if (buf->len > 0 && buf->data[buf->len - 1] != '(')
+ {
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(buf, ", ");
+ else
+ appendPQExpBufferChar(buf, ' ');
+ }
+
+ appendPQExpBuffer(buf, " %s", option_name);
+}
+
+/*
+ * Append an option with an associated string value to a server command that
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendStringCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, char *option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ if (option_value != NULL)
+ {
+ size_t length = strlen(option_value);
+ char *escaped_value = palloc(1 + 2 * length);
+
+ PQescapeStringConn(conn, escaped_value, option_value, length, NULL);
+ appendPQExpBuffer(buf, " '%s'", escaped_value);
+ pfree(escaped_value);
+ }
+}
+
+/*
+ * Append an option with an associated integer value to a server command
+ * is being constructed.
+ *
+ * See comments for AppendPlainCommandOption, above.
+ */
+void
+AppendIntegerCommandOption(PQExpBuffer buf, bool use_new_option_syntax,
+ char *option_name, int32 option_value)
+{
+ AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
+
+ appendPQExpBuffer(buf, " %d", option_value);
+}
/*
* Frontend version of GetCurrentTimestamp(), since we are not linked with
diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h
index 504803b976..65135c79e0 100644
--- a/src/bin/pg_basebackup/streamutil.h
+++ b/src/bin/pg_basebackup/streamutil.h
@@ -15,6 +15,7 @@
#include "access/xlogdefs.h"
#include "datatype/timestamp.h"
#include "libpq-fe.h"
+#include "pqexpbuffer.h"
extern const char *progname;
extern char *connection_string;
@@ -40,6 +41,17 @@ extern bool RunIdentifySystem(PGconn *conn, char **sysid,
TimeLineID *starttli,
XLogRecPtr *startpos,
char **db_name);
+
+extern void AppendPlainCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_value);
+extern void AppendStringCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, char *option_value);
+extern void AppendIntegerCommandOption(PQExpBuffer buf,
+ bool use_new_option_syntax,
+ char *option_name, int32 option_value);
+
extern bool RetrieveWalSegSize(PGconn *conn);
extern TimestampTz feGetCurrentTimestamp(void);
extern void feTimestampDifference(TimestampTz start_time, TimestampTz stop_time,
--
2.24.3 (Apple Git-128)
v6-0008-WIP-Server-side-gzip-compression.patchapplication/octet-stream; name=v6-0008-WIP-Server-side-gzip-compression.patchDownload
From 157bca07f356945eaf9613038452b85347376d9e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 13 Sep 2021 12:07:01 -0400
Subject: [PATCH v6 8/8] WIP: Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
---
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 303 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 38 ++-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 382 insertions(+), 2 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ed16c6861f..61c76160d1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -292,6 +300,10 @@ perform_base_backup(basebackup_options *opt)
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt->compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt->compression_level);
+
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -740,11 +752,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -914,6 +928,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..3d2fa93e55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,303 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index f9e91acff1..d79fafaeb6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -987,7 +988,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -996,14 +999,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1732,6 +1753,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2142,6 +2174,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2321,6 +2354,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 7365b39e23..10a316cacd 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -263,6 +263,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v6-0005-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v6-0005-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload
From 7613c1b6548af2623a664d2df7212ceefe0ad3fe Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 12:00:34 -0400
Subject: [PATCH v6 5/8] Introduce 'bbstreamer' abstraction to modularize
pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 +++++
src/bin/pg_basebackup/bbstreamer_file.c | 579 ++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 912 +++++-----------------
src/tools/pgindent/typedefs.list | 10 +
7 files changed, 1697 insertions(+), 727 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..8fda09dcd4 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -34,10 +34,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -60,7 +66,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..b24dc848c1
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content) (bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize) (bbstreamer *streamer);
+ void (*free) (bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..03e1ea2550
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,579 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include <unistd.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map) (const char *);
+ void (*report_output_file) (const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any
+ * symbolic link, and which should return a replacement pathname to be used
+ * in its place. If NULL, the symbolic link target is used without
+ * modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a
+ * new output file. The pathname to that file is passed as an argument. If
+ * NULL, the call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = pstrdup(basepath);
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6 clusters) will
+ * have been created by the wal receiver process. Also, when the WAL
+ * directory location was specified, pg_wal (or pg_xlog) has already
+ * been created as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ pfree(mystreamer->basepath);
+ pfree(mystreamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..4d15251fdc
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf; on
+ * older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..5a9f587dca
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,444 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+
+ /*
+ * If we're expecting an archive member header, accumulate a
+ * full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the file
+ * trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not the
+ * start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 27ee6394cf..67d01d8b6e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -28,18 +28,13 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
-#include "common/string.h"
#include "fe_utils/option_utils.h"
#include "fe_utils/recovery_gen.h"
-#include "fe_utils/string_utils.h"
#include "getopt_long.h"
-#include "libpq-fe.h"
-#include "pgtar.h"
-#include "pgtime.h"
-#include "pqexpbuffer.h"
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
@@ -62,34 +57,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -161,10 +131,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -190,14 +161,15 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force,
- bool finished);
-
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force, bool finished);
+
+static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported);
+static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -360,21 +332,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -763,6 +720,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -775,8 +740,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* is moved to the next line.
*/
static void
-progress_report(int tablespacenum, const char *filename,
- bool force, bool finished)
+progress_report(int tablespacenum, bool force, bool finished)
{
int percent;
char totaldone_str[32];
@@ -811,7 +775,7 @@ progress_report(int tablespacenum, const char *filename,
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -827,7 +791,7 @@ progress_report(int tablespacenum, const char *filename,
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -841,7 +805,7 @@ progress_report(int tablespacenum, const char *filename,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -987,257 +951,170 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
}
/*
- * Write a piece of tar data
+ * Figure out what to do with an archive received from the server based on
+ * the options selected by the user. We may just write the results directly
+ * to a file, or we might compress first, or we might extract the tar file
+ * and write each member separately. This function doesn't do any of that
+ * directly, but it works out what kind of bbstreamer we need to create so
+ * that the right stuff happens when, down the road, we actually receive
+ * the data.
*/
-static void
-writeTarData(WriteTarState *state, char *buf, int r)
+static bbstreamer *
+CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported)
{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-}
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer = NULL;
+ bool inject_manifest;
+ bool must_parse_archive;
-/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
- *
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
- */
-static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- char zerobuf[TAR_BLOCK_SIZE * 2];
- WriteTarState state;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
+ /*
+ * Normally, we emit the backup manifest as a separate file, but when
+ * we're writing a tarfile to stdout, we don't have that option, so
+ * include it in the one tarfile we've got.
+ */
+ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ /*
+ * We have to parse the archive if (1) we're suppose to extract it, or if
+ * (2) we need to inject backup_manifest or recovery configuration into it.
+ */
+ must_parse_archive = (format == 'p' || inject_manifest ||
+ (spclocation == NULL && writerecoveryconf));
- if (state.basetablespace)
+ if (format == 'p')
{
+ const char *directory;
+
/*
- * Base tablespaces
+ * In plain format, we must extract the archive. The data for the main
+ * tablespace will be written to the base directory, and the data for
+ * other tablespaces will be written to the directory where they're
+ * located on the server, after applying any user-specified tablespace
+ * mappings.
*/
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
-
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
- else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ directory = spclocation == NULL ? basedir
+ : get_tablespace_mapping(spclocation);
+ streamer = bbstreamer_extractor_new(directory,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
+ FILE *archive_file;
+ char archive_filename[MAXPGPATH];
+
/*
- * Specific tablespace
+ * In tar format, we just write the archive without extracting it.
+ * Normally, we write it to the archive name provided by the caller,
+ * but when the base directory is "-" that means we need to write
+ * to standard output.
*/
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(archive_filename, sizeof(archive_filename), "-");
+ archive_file = stdout;
}
else
-#endif
{
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
+ snprintf(archive_filename, sizeof(archive_filename),
+ "%s/%s", basedir, archive_name);
+ archive_file = NULL;
}
- }
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(archive_filename, ".gz", sizeof(archive_filename));
+ streamer = bbstreamer_gzip_writer_new(archive_filename,
+ archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+
+ /*
+ * If we need to parse the archive for whatever reason, then we'll
+ * also need to re-archive, because, if the output format is tar, the
+ * only point of parsing the archive is to be able to inject stuff
+ * into it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = archive_filename;
+ }
/*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
+ * If we're supposed to inject the backup manifest into the results,
+ * it should be done here, so that the file content can be injected
+ * directly, without worrying about the details of the tar format.
*/
+ if (inject_manifest)
+ manifest_inject_streamer = streamer;
- MemSet(zerobuf, 0, sizeof(zerobuf));
-
- if (state.basetablespace && writerecoveryconf)
+ /*
+ * If this is the main tablespace and we're supposed to write
+ * recovery information, arrange to do that.
+ */
+ if (spclocation == NULL && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ Assert(must_parse_archive);
+ streamer = bbstreamer_recovery_injector_new(streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ /*
+ * If we're doing anything that involves understanding the contents of
+ * the archive, we'll need to parse it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_parser_new(streamer);
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ /* Return the results. */
+ *manifest_inject_streamer_p = manifest_inject_streamer;
+ return streamer;
+}
- writeTarData(&state, header, sizeof(header));
+/*
+ * Receive raw tar data from the server, and stream it to the appropriate
+ * location. If we're writing a single tarfile to standard output, also
+ * receive the backup manifest and inject it into that tarfile.
+ */
+static void
+ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum)
+{
+ WriteTarState state;
+ bbstreamer *manifest_inject_streamer;
+ bool is_recovery_guc_supported;
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ /* Pass all COPY data through to the backup streamer. */
+ memset(&state, 0, sizeof(state));
+ is_recovery_guc_supported =
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ state.streamer = CreateBackupStreamer(archive_name, spclocation,
+ &manifest_inject_streamer,
+ is_recovery_guc_supported);
+ state.tablespacenum = tablespacenum;
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ progress_filename = NULL;
/*
- * Normally, we emit the backup manifest as a separate file, but when
- * we're writing a tarfile to stdout, we don't have that option, so
- * include it in the one tarfile we've got.
+ * The decision as to whether we need to inject the backup manifest into
+ * the output at this stage is made by CreateBackupStreamer; if that is
+ * needed, manifest_inject_streamer will be non-NULL; otherwise, it will
+ * be NULL.
*/
- if (strcmp(basedir, "-") == 0 && manifest)
+ if (manifest_inject_streamer != NULL)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
+ /* Slurp the entire backup manifest into a buffer. */
initPQExpBuffer(&buf);
ReceiveBackupManifestInMemory(conn, &buf);
if (PQExpBufferDataBroken(buf))
@@ -1245,42 +1122,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
- termPQExpBuffer(&buf);
- }
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
+ /* Inject it into the output tarfile. */
+ bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ buf.data, buf.len);
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
+ /* Free memory. */
+ termPQExpBuffer(&buf);
}
- progress_report(rownum, state.filename, true, false);
+ /* Cleanup. */
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+
+ progress_report(tablespacenum, true, false);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1296,184 +1151,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
-
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
+ bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
+ progress_report(state->tablespacenum, false, false);
}
@@ -1498,242 +1179,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true, false);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2032,16 +1477,32 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
+ /* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named base.tar
+ * if it's the main data directory or <tablespaceoid>.tar if it's for
+ * another tablespace. CreateBackupStreamer() will arrange to add .gz
+ * to the archive name if pg_basebackup is performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
+ }
/*
* Now receive backup manifest, if appropriate.
@@ -2057,7 +1518,10 @@ BaseBackup(void)
ReceiveBackupManifest(conn);
if (showprogress)
- progress_report(PQntuples(res), NULL, true, true);
+ {
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true, true);
+ }
PQclear(res);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 050dcfda15..2f50dc4de1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3770,3 +3770,13 @@ bbsink
bbsink_ops
bbsink_state
bbsink_throttle
+bbstreamer
+bbstreamer
+bbstreamer_archive_context
+bbstreamer_bzip_writer
+bbstreamer_member
+bbstreamer_ops
+bbstreamer_plain_writer
+bbstreamer_recovery_injector
+bbstreamer_tar_archiver
+bbstreamer_tar_parser
--
2.24.3 (Apple Git-128)
v6-0003-Refactor-basebackup.c-s-_tarWriteDir-function.patchapplication/octet-stream; name=v6-0003-Refactor-basebackup.c-s-_tarWriteDir-function.patchDownload
From 02f123cd4c9b2a7c106b593128d0581810356282 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v6 3/8] Refactor basebackup.c's _tarWriteDir() function.
Sometimes, we replace a symbolic link that we find in the data
directory with an actual directory within the tarfile that we
create. _tarWriteDir was responsible both for making this
substitution and also for writing the tar header for the
resulting directory into the tar file. Make it do only the first
of those things, and rename to convert_link_to_directory.
Substantially larger refactoring of this source file is planned,
but this little bit seemed to make sense to commit
independently.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 4c97ab7b5a..b31c36d918 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -71,8 +71,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1381,7 +1380,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1397,7 +1398,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1409,7 +1412,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1883,12 +1888,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1897,8 +1901,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.3 (Apple Git-128)
v6-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patchapplication/octet-stream; name=v6-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patchDownload
From e2f3eb25bf15e3f4de677cd1bbcb98b0cd220ce4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 21 Sep 2021 12:22:15 -0400
Subject: [PATCH v6 2/8] Flexible options for CREATE_REPLICATION_SLOT.
Like BASE_BACKUP, CREATE_REPLICATION_SLOT has historically used a
hard-coded syntax. To improve future extensibility, adopt a flexible
options syntax here, too.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_receivewal and
pg_recvlogical use it.
Patch by me, reviewed by Fabien Coelho and Sergei Kornilov.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
---
doc/src/sgml/protocol.sgml | 37 ++++++++++++-----
.../libpqwalreceiver/libpqwalreceiver.c | 16 ++++----
src/backend/replication/repl_gram.y | 35 +++++++++-------
src/backend/replication/walsender.c | 40 ++++++++++---------
src/bin/pg_basebackup/streamutil.c | 40 ++++++++++++++++---
5 files changed, 111 insertions(+), 57 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a5c07bfefd..aed06b968e 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1914,7 +1914,7 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry id="protocol-replication-create-slot" xreflabel="CREATE_REPLICATION_SLOT">
- <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable> [ <literal>TEMPORARY</literal> ] { <literal>PHYSICAL</literal> [ <literal>RESERVE_WAL</literal> ] | <literal>LOGICAL</literal> <replaceable class="parameter">output_plugin</replaceable> [ <literal>EXPORT_SNAPSHOT</literal> | <literal>NOEXPORT_SNAPSHOT</literal> | <literal>USE_SNAPSHOT</literal> | <literal>TWO_PHASE</literal> ] }
+ <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable> [ <literal>TEMPORARY</literal> ] { <literal>PHYSICAL</literal> | <literal>LOGICAL</literal> } [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
<indexterm><primary>CREATE_REPLICATION_SLOT</primary></indexterm>
</term>
<listitem>
@@ -1954,46 +1954,50 @@ The commands accepted in replication mode are:
</para>
</listitem>
</varlistentry>
+ </variablelist>
+
+ <para>The following options are supported:</para>
+ <variablelist>
<varlistentry>
- <term><literal>TWO_PHASE</literal></term>
+ <term><literal>TWO_PHASE [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Specify that this logical replication slot supports decoding of two-phase
+ If true, this logical replication slot supports decoding of two-phase
transactions. With this option, two-phase commands like
<literal>PREPARE TRANSACTION</literal>, <literal>COMMIT PREPARED</literal>
and <literal>ROLLBACK PREPARED</literal> are decoded and transmitted.
The transaction will be decoded and transmitted at
<literal>PREPARE TRANSACTION</literal> time.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>RESERVE_WAL</literal></term>
+ <term><literal>RESERVE_WAL [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
<para>
- Specify that this physical replication slot reserves <acronym>WAL</acronym>
+ If true, this physical replication slot reserves <acronym>WAL</acronym>
immediately. Otherwise, <acronym>WAL</acronym> is only reserved upon
connection from a streaming replication client.
+ The default is false.
</para>
</listitem>
</varlistentry>
<varlistentry>
- <term><literal>EXPORT_SNAPSHOT</literal></term>
- <term><literal>NOEXPORT_SNAPSHOT</literal></term>
- <term><literal>USE_SNAPSHOT</literal></term>
+ <term><literal>SNAPSHOT { 'export' | 'use' | 'nothing' }</literal></term>
<listitem>
<para>
Decides what to do with the snapshot created during logical slot
- initialization. <literal>EXPORT_SNAPSHOT</literal>, which is the default,
+ initialization. <literal>'export'</literal>, which is the default,
will export the snapshot for use in other sessions. This option can't
- be used inside a transaction. <literal>USE_SNAPSHOT</literal> will use the
+ be used inside a transaction. <literal>'use'</literal> will use the
snapshot for the current transaction executing the command. This
option must be used in a transaction, and
<literal>CREATE_REPLICATION_SLOT</literal> must be the first command
- run in that transaction. Finally, <literal>NOEXPORT_SNAPSHOT</literal> will
+ run in that transaction. Finally, <literal>'nothing'</literal> will
just use the snapshot for logical decoding as normal but won't do
anything else with it.
</para>
@@ -2052,6 +2056,17 @@ The commands accepted in replication mode are:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</replaceable> [ <literal>TEMPORARY</literal> ] { <literal>PHYSICAL</literal> [ <literal>RESERVE_WAL</literal> ] | <literal>LOGICAL</literal> <replaceable class="parameter">output_plugin</replaceable> [ <literal>EXPORT_SNAPSHOT</literal> | <literal>NOEXPORT_SNAPSHOT</literal> | <literal>USE_SNAPSHOT</literal> | <literal>TWO_PHASE</literal> ] }
+ </term>
+ <listitem>
+ <para>
+ For compatibility with older releases, this alternative syntax for
+ the <literal>CREATE_REPLICATION_SLOT</literal> command is still supported.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>START_REPLICATION</literal> [ <literal>SLOT</literal> <replaceable class="parameter">slot_name</replaceable> ] [ <literal>PHYSICAL</literal> ] <replaceable class="parameter">XXX/XXX</replaceable> [ <literal>TIMELINE</literal> <replaceable class="parameter">tli</replaceable> ]
<indexterm><primary>START_REPLICATION</primary></indexterm>
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 19ea159af4..e3a783ebec 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -872,26 +872,28 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
if (conn->logical)
{
- appendStringInfoString(&cmd, " LOGICAL pgoutput");
- if (two_phase)
- appendStringInfoString(&cmd, " TWO_PHASE");
+ appendStringInfoString(&cmd, " LOGICAL pgoutput (");
switch (snapshot_action)
{
case CRS_EXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " EXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, "SNAPSHOT 'export'");
break;
case CRS_NOEXPORT_SNAPSHOT:
- appendStringInfoString(&cmd, " NOEXPORT_SNAPSHOT");
+ appendStringInfoString(&cmd, "SNAPSHOT 'nothing'");
break;
case CRS_USE_SNAPSHOT:
- appendStringInfoString(&cmd, " USE_SNAPSHOT");
+ appendStringInfoString(&cmd, "SNAPSHOT 'use'");
break;
}
+
+ if (two_phase)
+ appendStringInfoString(&cmd, ", TWO_PHASE");
+ appendStringInfoChar(&cmd, ')');
}
else
{
- appendStringInfoString(&cmd, " PHYSICAL RESERVE_WAL");
+ appendStringInfoString(&cmd, " PHYSICAL (RESERVE_WAL)");
}
res = libpqrcv_PQexec(conn->streamConn, cmd.data);
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 3b59d62ed8..126380e2df 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -103,8 +103,8 @@ static SQLCmd *make_sqlcmd(void);
%type <node> plugin_opt_arg
%type <str> opt_slot var_name ident_or_keyword
%type <boolval> opt_temporary
-%type <list> create_slot_opt_list
-%type <defelt> create_slot_opt
+%type <list> create_slot_options create_slot_legacy_opt_list
+%type <defelt> create_slot_legacy_opt
%%
@@ -243,8 +243,8 @@ base_backup_legacy_opt:
;
create_replication_slot:
- /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL RESERVE_WAL */
- K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY PHYSICAL [options] */
+ K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -254,8 +254,8 @@ create_replication_slot:
cmd->options = $5;
$$ = (Node *) cmd;
}
- /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
- | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+ /* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin [options] */
+ | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_options
{
CreateReplicationSlotCmd *cmd;
cmd = makeNode(CreateReplicationSlotCmd);
@@ -268,28 +268,33 @@ create_replication_slot:
}
;
-create_slot_opt_list:
- create_slot_opt_list create_slot_opt
+create_slot_options:
+ '(' generic_option_list ')' { $$ = $2; }
+ | create_slot_legacy_opt_list { $$ = $1; }
+ ;
+
+create_slot_legacy_opt_list:
+ create_slot_legacy_opt_list create_slot_legacy_opt
{ $$ = lappend($1, $2); }
| /* EMPTY */
{ $$ = NIL; }
;
-create_slot_opt:
+create_slot_legacy_opt:
K_EXPORT_SNAPSHOT
{
- $$ = makeDefElem("export_snapshot",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("snapshot",
+ (Node *)makeString("export"), -1);
}
| K_NOEXPORT_SNAPSHOT
{
- $$ = makeDefElem("export_snapshot",
- (Node *)makeInteger(false), -1);
+ $$ = makeDefElem("snapshot",
+ (Node *)makeString("nothing"), -1);
}
| K_USE_SNAPSHOT
{
- $$ = makeDefElem("use_snapshot",
- (Node *)makeInteger(true), -1);
+ $$ = makeDefElem("snapshot",
+ (Node *)makeString("use"), -1);
}
| K_RESERVE_WAL
{
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ca2a11389..b811a5c0ef 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -872,26 +872,30 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
{
DefElem *defel = (DefElem *) lfirst(lc);
- if (strcmp(defel->defname, "export_snapshot") == 0)
+ if (strcmp(defel->defname, "snapshot") == 0)
{
+ char *action;
+
if (snapshot_action_given || cmd->kind != REPLICATION_KIND_LOGICAL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
+ action = defGetString(defel);
snapshot_action_given = true;
- *snapshot_action = defGetBoolean(defel) ? CRS_EXPORT_SNAPSHOT :
- CRS_NOEXPORT_SNAPSHOT;
- }
- else if (strcmp(defel->defname, "use_snapshot") == 0)
- {
- if (snapshot_action_given || cmd->kind != REPLICATION_KIND_LOGICAL)
+
+ if (strcmp(action, "export") == 0)
+ *snapshot_action = CRS_EXPORT_SNAPSHOT;
+ else if (strcmp(action, "nothing") == 0)
+ *snapshot_action = CRS_NOEXPORT_SNAPSHOT;
+ else if (strcmp(action, "use") == 0)
+ *snapshot_action = CRS_USE_SNAPSHOT;
+ else
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("conflicting or redundant options")));
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized value for CREATE_REPLICATION_SLOT option \"%s\": \"%s\"",
+ defel->defname, action)));
- snapshot_action_given = true;
- *snapshot_action = CRS_USE_SNAPSHOT;
}
else if (strcmp(defel->defname, "reserve_wal") == 0)
{
@@ -901,7 +905,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
errmsg("conflicting or redundant options")));
reserve_wal_given = true;
- *reserve_wal = true;
+ *reserve_wal = defGetBoolean(defel);
}
else if (strcmp(defel->defname, "two_phase") == 0)
{
@@ -910,7 +914,7 @@ parseCreateReplSlotOptions(CreateReplicationSlotCmd *cmd,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
two_phase_given = true;
- *two_phase = true;
+ *two_phase = defGetBoolean(defel);
}
else
elog(ERROR, "unrecognized option: %s", defel->defname);
@@ -980,7 +984,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... EXPORT_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'export')")));
need_full_snapshot = true;
}
@@ -990,25 +994,25 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called inside a transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
if (XactIsoLevel != XACT_REPEATABLE_READ)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called in REPEATABLE READ isolation mode transaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
if (FirstSnapshotSet)
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must be called before any query",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
if (IsSubTransaction())
ereport(ERROR,
/*- translator: %s is a CREATE_REPLICATION_SLOT statement */
(errmsg("%s must not be called in a subtransaction",
- "CREATE_REPLICATION_SLOT ... USE_SNAPSHOT")));
+ "CREATE_REPLICATION_SLOT ... (SNAPSHOT 'use')")));
need_full_snapshot = true;
}
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index d782b81adc..37237cd5d9 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -490,6 +490,7 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
{
PQExpBuffer query;
PGresult *res;
+ bool use_new_option_syntax = (PQserverVersion(conn) >= 150000);
query = createPQExpBuffer();
@@ -498,27 +499,54 @@ CreateReplicationSlot(PGconn *conn, const char *slot_name, const char *plugin,
Assert(!(two_phase && is_physical));
Assert(slot_name != NULL);
- /* Build query */
+ /* Build base portion of query */
appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
if (is_temporary)
appendPQExpBufferStr(query, " TEMPORARY");
if (is_physical)
- {
appendPQExpBufferStr(query, " PHYSICAL");
+ else
+ appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
+
+ /* Add any requested options */
+ if (use_new_option_syntax)
+ appendPQExpBufferStr(query, " (");
+ if (is_physical)
+ {
if (reserve_wal)
- appendPQExpBufferStr(query, " RESERVE_WAL");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "RESERVE_WAL");
}
else
{
- appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
if (two_phase && PQserverVersion(conn) >= 150000)
- appendPQExpBufferStr(query, " TWO_PHASE");
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "TWO_PHASE");
if (PQserverVersion(conn) >= 100000)
+ {
/* pg_recvlogical doesn't use an exported snapshot, so suppress */
- appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+ if (use_new_option_syntax)
+ AppendStringCommandOption(query, use_new_option_syntax,
+ "SNAPSHOT", "nothing");
+ else
+ AppendPlainCommandOption(query, use_new_option_syntax,
+ "NOEXPORT_SNAPSHOT");
+ }
+ }
+ if (use_new_option_syntax)
+ {
+ /* Suppress option list if it would be empty, otherwise terminate */
+ if (query->data[query->len - 1] == '(')
+ {
+ query->len -= 2;
+ query->data[query->len] = '\0';
+ }
+ else
+ appendPQExpBufferChar(query, ')');
}
+ /* Now run the query */
res = PQexec(conn, query->data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
--
2.24.3 (Apple Git-128)
v6-0004-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v6-0004-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload
From 4047e91b4009f088d413cfd82d3998a1b92a619b Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 11:45:50 -0400
Subject: [PATCH v6 4/8] Introduce 'bbsink' abstraction to modularize base
backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc', but in the future we might introduce
other options.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
---
src/backend/replication/Makefile | 4 +
src/backend/replication/backup_manifest.c | 28 +-
src/backend/replication/basebackup.c | 674 +++++-------------
src/backend/replication/basebackup_copy.c | 324 +++++++++
src/backend/replication/basebackup_progress.c | 250 +++++++
src/backend/replication/basebackup_sink.c | 115 +++
src/backend/replication/basebackup_throttle.c | 198 +++++
src/include/replication/backup_manifest.h | 5 +-
src/include/replication/basebackup_sink.h | 275 +++++++
src/tools/pgindent/typedefs.list | 4 +
10 files changed, 1363 insertions(+), 514 deletions(-)
create mode 100644 src/backend/replication/basebackup_copy.c
create mode 100644 src/backend/replication/basebackup_progress.c
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/backend/replication/basebackup_throttle.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..74b97cf126 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,10 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_copy.o \
+ basebackup_progress.o \
+ basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 04ca455ace..4fe11a3b5c 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -310,9 +311,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -352,38 +352,28 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
- *
- * We choose to read back the data from the temporary file in chunks of
- * size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
- * size, so it seems to make sense to match that value here.
+ * Send the backup manifest.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
- char manifestbuf[BLCKSZ];
size_t bytes_to_read;
size_t rc;
- bytes_to_read = Min(sizeof(manifestbuf),
+ bytes_to_read = Min(sink->bbs_buffer_length,
manifest->manifest_size - manifest_bytes_done);
- rc = BufFileRead(manifest->buffile, manifestbuf, bytes_to_read);
+ rc = BufFileRead(manifest->buffile, sink->bbs_buffer,
+ bytes_to_read);
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b31c36d918..0cd118f1f1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -46,6 +43,16 @@
#include "utils/resowner.h"
#include "utils/timestamp.h"
+/*
+ * How much data do we want to send in one CopyData message? Note that
+ * this may also result in reading the underlying files in chunks of this
+ * size.
+ *
+ * NB: The buffer size is required to be a multiple of the system block
+ * size, so use that value instead if it's bigger than our preference.
+ */
+#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+
typedef struct
{
const char *label;
@@ -59,27 +66,25 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
+static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -90,46 +95,12 @@ static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
-/*
- * Size of each block sent into the tar stream for larger files.
- */
-#define TAR_SEND_SIZE 32768
-
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
-/* The starting XLOG position of the base backup. */
-static XLogRecPtr startptr;
-
/* Total number of checksum failures during base backup. */
static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -255,30 +226,29 @@ static const struct exclude_list_item noChecksumFiles[] = {
static void
perform_base_backup(basebackup_options *opt)
{
- TimeLineID starttli;
+ bbsink_state state;
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- List *tablespaces = NIL;
+ bbsink *sink = bbsink_copytblspc_new();
+ bbsink *progress_sink;
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ /* Initial backup state, insofar as we know it now. */
+ state.tablespaces = NIL;
+ state.tablespace_num = 0;
+ state.bytes_done = 0;
+ state.bytes_total = 0;
+ state.bytes_total_is_valid = false;
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -295,11 +265,11 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
- startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
- tblspc_map_file);
+ basebackup_progress_wait_checkpoint();
+ state.startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint,
+ &state.starttli,
+ labelfile, &state.tablespaces,
+ tblspc_map_file);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -312,7 +282,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -329,7 +298,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = -1;
- tablespaces = lappend(tablespaces, ti);
+ state.tablespaces = lappend(state.tablespaces, ti);
/*
* Calculate the total backup size by summing up the size of each
@@ -337,100 +306,53 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
- NULL);
+ tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
+ true, NULL, NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
+ state.bytes_total += tmp->size;
}
+ state.bytes_total_is_valid = true;
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
-
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, &state, SINK_BUFFER_LENGTH);
/* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
+ sendDir(sink, ".", 1, false, state.tablespaces,
+ sendtblspclinks, &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -438,32 +360,33 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ Assert(lnext(state.tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
+ bbsink_end_archive(sink);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -489,8 +412,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -501,7 +423,7 @@ perform_base_backup(basebackup_options *opt)
* shouldn't be such files, but if there are, there's little harm in
* including them.
*/
- XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLByteToSeg(state.startptr, startsegno, wal_segment_size);
XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
@@ -591,7 +513,6 @@ perform_base_backup(basebackup_options *opt)
{
char *walFileName = (char *) lfirst(lc);
int fd;
- char buf[TAR_SEND_SIZE];
size_t cnt;
pgoff_t len = 0;
@@ -630,22 +551,17 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
- while ((cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf),
+ while ((cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length,
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -674,7 +590,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +613,23 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
+ AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
+ endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -739,7 +655,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -961,155 +877,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", LSN_FORMAT_ARGS(ptr));
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
+ int bytes_done = 0,
len;
pg_checksum_context checksum_ctx;
@@ -1135,25 +911,23 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
- update_basebackup_progress(len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
+ elog(ERROR, "could not update checksum of file \"%s\"",
+ filename);
+
+ while (bytes_done < len)
{
- char buf[TAR_BLOCK_SIZE];
+ size_t remaining = len - bytes_done;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
+ memcpy(sink->bbs_buffer, content, nbytes);
+ bbsink_archive_contents(sink, nbytes);
+ bytes_done += nbytes;
}
- if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
- elog(ERROR, "could not update checksum of file \"%s\"",
- filename);
+ _tarWritePadding(sink, len);
AddFileToBackupManifest(manifest, NULL, filename, len,
(pg_time_t) statbuf.st_mtime, &checksum_ctx);
@@ -1167,7 +941,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1197,11 +971,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1220,8 +994,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1381,8 +1155,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1399,8 +1173,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1413,15 +1187,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1452,7 +1226,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1476,7 +1250,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1508,7 +1282,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1516,7 +1290,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1593,21 +1367,19 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
bool block_retry = false;
- char buf[TAR_SEND_SIZE];
uint16 checksum;
int checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
@@ -1628,7 +1400,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1669,9 +1441,11 @@ sendFile(const char *readfilename, const char *tarfilename,
*/
while (len < statbuf->st_size)
{
+ size_t remaining = statbuf->st_size - len;
+
/* Try to read some more data. */
- cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf), statbuf->st_size - len),
+ cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length, remaining),
len, readfilename, true);
/*
@@ -1688,7 +1462,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read a multiple of
* BLCKSZ bytes.
*/
- Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
if (verify_checksum && (cnt % BLCKSZ != 0))
{
@@ -1704,7 +1478,7 @@ sendFile(const char *readfilename, const char *tarfilename,
{
for (i = 0; i < cnt / BLCKSZ; i++)
{
- page = buf + BLCKSZ * i;
+ page = sink->bbs_buffer + BLCKSZ * i;
/*
* Only check pages which have not been modified since the
@@ -1714,7 +1488,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* this case. We also skip completely new pages, since they
* don't have a checksum yet.
*/
- if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+ if (!PageIsNew(page) && PageGetLSN(page) < sink->bbs_state->startptr)
{
checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
phdr = (PageHeader) page;
@@ -1736,7 +1510,8 @@ sendFile(const char *readfilename, const char *tarfilename,
/* Reread the failed block */
reread_cnt =
- basebackup_read_file(fd, buf + BLCKSZ * i,
+ basebackup_read_file(fd,
+ sink->bbs_buffer + BLCKSZ * i,
BLCKSZ, len + BLCKSZ * i,
readfilename,
false);
@@ -1783,34 +1558,29 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
/* Also feed it to the checksum machinery. */
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer, cnt) < 0)
elog(ERROR, "could not update checksum of base backup");
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
- if (len < statbuf->st_size)
+ while (len < statbuf->st_size)
{
- MemSet(buf, 0, sizeof(buf));
- while (len < statbuf->st_size)
- {
- cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
- elog(ERROR, "could not update checksum of base backup");
- update_basebackup_progress(cnt);
- len += cnt;
- throttle(cnt);
- }
+ size_t remaining = statbuf->st_size - len;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
+
+ MemSet(sink->bbs_buffer, 0, nbytes);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ nbytes) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ bbsink_archive_contents(sink, nbytes);
+ len += nbytes;
}
/*
@@ -1818,13 +1588,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
- }
+ _tarWritePadding(sink, len);
CloseTransientFile(fd);
@@ -1847,18 +1611,28 @@ sendFile(const char *readfilename, const char *tarfilename,
return true;
}
-
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[TAR_BLOCK_SIZE];
enum tarError rc;
if (!sizeonly)
{
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ /*
+ * As of this writing, the smallest supported block size is 1kB, which
+ * is twice TAR_BLOCK_SIZE. Since the buffer size is required to be a
+ * multiple of BLCKSZ, it should be safe to assume that the buffer is
+ * large enough to fit an entire tar block. We double-check by means of
+ * these assertions.
+ */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= BLCKSZ,
+ "BLCKSZ too small for tar block");
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ rc = tarCreateHeader(sink->bbs_buffer, filename, linktarget,
+ statbuf->st_size, statbuf->st_mode,
+ statbuf->st_uid, statbuf->st_gid,
statbuf->st_mtime);
switch (rc)
@@ -1880,134 +1654,48 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
- update_basebackup_progress(sizeof(h));
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
}
- return sizeof(h);
-}
-
-/*
- * If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
- */
-static void
-convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
-{
- /* If symlink, write it as a directory anyway */
-#ifndef WIN32
- if (S_ISLNK(statbuf->st_mode))
-#else
- if (pgwin32_is_junction(pathbuf))
-#endif
- statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
+ return TAR_BLOCK_SIZE;
}
/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
+ * Pad with zero bytes out to a multiple of TAR_BLOCK_SIZE.
*/
static void
-throttle(size_t increment)
+_tarWritePadding(bbsink *sink, int len)
{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
+ int pad = tarPaddingBytesRequired(len);
/*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
+ * As in _tarWriteHeader, it should be safe to assume that the buffer is
+ * large enough that we don't need to do this in multiple chunks.
*/
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+ Assert(pad <= TAR_BLOCK_SIZE);
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
-
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
+ if (pad > 0)
+ {
+ MemSet(sink->bbs_buffer, 0, pad);
+ bbsink_archive_contents(sink, pad);
}
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
}
/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
static void
-update_basebackup_progress(int64 delta)
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
+ /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+ if (S_ISLNK(statbuf->st_mode))
+#else
+ if (pgwin32_is_junction(pathbuf))
+#endif
+ statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
/*
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
new file mode 100644
index 0000000000..564f010188
--- /dev/null
+++ b/src/backend/replication/basebackup_copy.c
@@ -0,0 +1,324 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_copy.c
+ * send basebackup archives using one COPY OUT operation per
+ * tablespace, and an additional COPY OUT for the backup manifest
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_copy.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_copytblspc_begin_backup(bbsink *sink);
+static void bbsink_copytblspc_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copytblspc_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_archive(bbsink *sink);
+static void bbsink_copytblspc_begin_manifest(bbsink *sink);
+static void bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_manifest(bbsink *sink);
+static void bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendTablespaceList(List *tablespaces);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+
+const bbsink_ops bbsink_copytblspc_ops = {
+ .begin_backup = bbsink_copytblspc_begin_backup,
+ .begin_archive = bbsink_copytblspc_begin_archive,
+ .archive_contents = bbsink_copytblspc_archive_contents,
+ .end_archive = bbsink_copytblspc_end_archive,
+ .begin_manifest = bbsink_copytblspc_begin_manifest,
+ .manifest_contents = bbsink_copytblspc_manifest_contents,
+ .end_manifest = bbsink_copytblspc_end_manifest,
+ .end_backup = bbsink_copytblspc_end_backup
+};
+
+/*
+ * Create a new 'copytblspc' bbsink.
+ */
+bbsink *
+bbsink_copytblspc_new(void)
+{
+ bbsink *sink = palloc0(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_copytblspc_ops;
+
+ return sink;
+}
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_copytblspc_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ /* Create a suitable buffer. */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_copytblspc_archive_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", LSN_FORMAT_ARGS(ptr));
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a result set via libpq describing the tablespace list.
+ */
+static void
+SendTablespaceList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..79f4d9dea3
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress tracking, including but not
+ * limited to command progress reporting.
+ *
+ * This should be used even if the PROGRESS option to the replication
+ * command BASE_BACKUP is not specified. Without that option, we won't
+ * have tallied up the size of the files that are going to need to be
+ * backed up, but we can still report to the command progress reporting
+ * facility how much data we've processed.
+ *
+ * Moreover, we also use this as a convenient place to update certain
+ * fields of the bbsink_state. That work is accurately described as
+ * keeping track of our progress, but it's not just for introspection.
+ * We need those fields to be updated properly in order for base backups
+ * to work.
+ *
+ * This particular basebackup sink requires extra callbacks that most base
+ * backup sinks don't. Rather than cramming those into the interface, we just
+ * have a few extra functions here that basebackup.c can call. (We could put
+ * the logic directly into that file as it's fairly simple, but it seems
+ * cleaner to have everything related to progress reporting in one place.)
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+static void bbsink_progress_begin_backup(bbsink *sink);
+static void bbsink_progress_archive_contents(bbsink *sink, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress tracking functions and
+ * forwards data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink));
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_progress_ops;
+ sink->bbs_next = next;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of the
+ * backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL, -1);
+
+ return sink;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+
+ /*
+ * Report that we are now streaming database files as a base backup. Also
+ * advertise the number of tablespaces, and, if known, the estimated total
+ * backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ if (sink->bbs_state->bytes_total_is_valid)
+ val[1] = sink->bbs_state->bytes_total;
+ else
+ val[1] = -1;
+ val[2] = list_length(sink->bbs_state->tablespaces);
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ bbsink_forward_begin_backup(sink);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ /*
+ * We expect one archive per tablespace, so reaching the end of an archive
+ * also means reaching the end of a tablespace. (Some day we might have a
+ * reason to decouple these concepts.)
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (sink->bbs_state->tablespace_num < list_length(sink->bbs_state->tablespaces))
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ sink->bbs_state->tablespace_num + 1);
+
+ /* Delegate to next sink. */
+ bbsink_forward_end_archive(sink);
+
+ /*
+ * This is a convenient place to update the bbsink_state's notion of which
+ * is the current tablespace. Note that the bbsink_state object is shared
+ * across all bbsink objects involved, but we're the outermost one and
+ * this is the very last thing we do.
+ */
+ sink->bbs_state->tablespace_num++;
+}
+
+/*
+ * Handle progress tracking for new archive contents.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+
+ /* First update bbsink_state with # of bytes done. */
+ state->bytes_done += len;
+
+ /* Now forward to next sink. */
+ bbsink_forward_archive_contents(sink, len);
+
+ /* Prepare to set # of bytes done for command progress reporting. */
+ val[nparam++] = state->bytes_done;
+
+ /*
+ * We may also want to update # of total bytes, to avoid overflowing past
+ * 100% or the full size. This may make the total size number change as we
+ * approach the end of the backup (the estimate will always be wrong if
+ * WAL is included), but that's better than having the done column be
+ * bigger than the total.
+ */
+ if (state->bytes_total_is_valid && state->bytes_done > state->bytes_total)
+ val[nparam++] = state->bytes_done;
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+ Assert(state->tablespace_num >= list_length(state->tablespaces) - 1);
+ Assert(state->tablespace_num <= list_length(state->tablespaces));
+
+ /*
+ * We report having finished all tablespaces at this point, even if the
+ * archive for the main tablespace is still open, because what's going to
+ * be added is WAL files, not files that are really from the main
+ * tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = list_length(state->tablespaces);
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..14104f50e8
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,115 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+/*
+ * Forward begin_backup callback.
+ *
+ * Only use this implementation if you want the bbsink you're implementing to
+ * share a buffer with the succesor bbsink.
+ */
+void
+bbsink_forward_begin_backup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_state != NULL);
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+ sink->bbs_buffer = sink->bbs_next->bbs_buffer;
+}
+
+/*
+ * Forward begin_archive callback.
+ */
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+/*
+ * Forward archive_contents callback.
+ *
+ * Code that wants to use this should initalize its own bbs_buffer and
+ * bbs_buffer_length fields to the values from the successor sink. In cases
+ * where the buffer isn't shared, the data needs to be copied before forwarding
+ * the callback. We don't do try to do that here, because there's really no
+ * reason to have separately allocated buffers containing the same identical
+ * data.
+ */
+void
+bbsink_forward_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_archive_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_archive callback.
+ */
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * Forward begin_manifest callback.
+ */
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward manifest_contents callback.
+ *
+ * As with the archive_contents callback, it's expected that the buffer is
+ * shared.
+ */
+void
+bbsink_forward_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_manifest callback.
+ */
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward end_backup callback.
+ */
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..1606463291
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink);
+static void bbsink_throttle_archive_contents(bbsink *sink, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ bbsink_forward_begin_backup(sink);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 099108910c..16ed7eec9b 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,7 +47,8 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
extern void FreeBackupManifest(backup_manifest_info *manifest);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..41c9c367f7
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,275 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * Taking a base backup produces one archive per tablespace directory,
+ * plus a backup manifest unless that feature has been disabled. The
+ * goal of the backup process is to put those archives and that manifest
+ * someplace, possibly after postprocessing them in some way. A 'bbsink'
+ * is an object to which those archives, and the manifest if present,
+ * can be sent.
+ *
+ * In practice, there will be a chain of 'bbsink' objects rather than
+ * just one, with callbacks being forwarded from one to the next,
+ * possibly with modification. Each object is responsible for a
+ * single task e.g. command progress reporting, throttling, or
+ * communication with the client.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Overall backup state shared by all bbsink objects for a backup.
+ *
+ * Before calling bbstate_begin_backup, caller must initiate a bbsink_state
+ * object which will last for the lifetime of the backup, and must thereafter
+ * update it as required before each new call to a bbsink method. The bbsink
+ * will retain a pointer to the state object and will consult it to understand
+ * the progress of the backup.
+ *
+ * 'tablespaces' is a list of tablespaceinfo objects. It must be set before
+ * calling bbstate_begin_backup() and must not be modified thereafter.
+ *
+ * 'tablespace_num' is the index of the current tablespace within the list
+ * stored in 'tablespaces'.
+ *
+ * 'bytes_done' is the number of bytes read so far from $PGDATA.
+ *
+ * 'bytes_total' is the total number of bytes estimated to be present in
+ * $PGDATA, if we have estimated this.
+ *
+ * 'bytes_total_is_valid' is true if and only if a proper estimate has been
+ * stored into 'bytes_total'.
+ *
+ * 'startptr' and 'starttli' identify the point in the WAL stream at which
+ * the backup began. They must be set before calling bbstate_begin_backup()
+ * and must not be modified thereafter.
+ */
+typedef struct bbsink_state
+{
+ List *tablespaces;
+ int tablespace_num;
+ uint64 bytes_done;
+ uint64 bytes_total;
+ bool bytes_total_is_valid;
+ XLogRecPtr startptr;
+ TimeLineID starttli;
+} bbsink_state;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
+ *
+ * 'bbs_next' is a pointer to another bbsink to which this bbsink is
+ * forwarding some or all operations.
+ *
+ * 'bbs_state' is a pointer to the bbsink_state object for this backup.
+ * Every bbsink associated with this backup should point to the same
+ * underlying state object.
+ *
+ * In general it is expected that the values of these fields are set when
+ * a bbsink is created and that they do not change thereafter. It's OK
+ * to modify the data to which bbs_buffer or bbs_state point, but no changes
+ * should be made to the contents of this struct.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ char *bbs_buffer;
+ size_t bbs_buffer_length;
+ bbsink *bbs_next;
+ bbsink_state *bbs_state;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline
+ * functions rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /*
+ * This callback is invoked just once, at the very start of the backup.
+ * It must set bbs_buffer to point to a chunk of storage where at least
+ * bbs_buffer_length bytes of data can be written.
+ */
+ void (*begin_backup) (bbsink *sink);
+
+ /*
+ * For each archive transmitted to a bbsink, there will be one call to the
+ * begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ *
+ * Before invoking the archive_contents() callback, the caller should copy
+ * a number of bytes equal to what will be passed as len into bbs_buffer,
+ * but not more than bbs_buffer_length.
+ *
+ * It's generally good if the buffer is as full as possible before the
+ * archive_contents() callback is invoked, but it's not worth expending
+ * extra cycles to make sure it's absolutely 100% full.
+ */
+ void (*begin_archive) (bbsink *sink, const char *archive_name);
+ void (*archive_contents) (bbsink *sink, size_t len);
+ void (*end_archive) (bbsink *sink);
+
+ /*
+ * If a backup manifest is to be transmitted to a bbsink, there will be
+ * one call to the begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback. These calls will occur after all archives are transmitted.
+ *
+ * The rules for invoking the manifest_contents() callback are the same as
+ * for the archive_contents() callback above.
+ */
+ void (*begin_manifest) (bbsink *sink);
+ void (*manifest_contents) (bbsink *sink, size_t len);
+ void (*end_manifest) (bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup) (bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, bbsink_state *state, int buffer_length)
+{
+ Assert(sink != NULL);
+
+ Assert(buffer_length > 0);
+
+ sink->bbs_state = state;
+ sink->bbs_buffer_length = buffer_length;
+ sink->bbs_ops->begin_backup(sink);
+
+ Assert(sink->bbs_buffer != NULL);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /*
+ * The caller should make a reasonable attempt to fill the buffer before
+ * calling this function, so it shouldn't be completely empty. Nor should
+ * it be filled beyond capacity.
+ */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->archive_contents(sink, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /* See comments in bbsink_archive_contents. */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->manifest_contents(sink, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ Assert(sink->bbs_state->tablespace_num == list_length(sink->bbs_state->tablespaces));
+
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 402a6617a9..050dcfda15 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3766,3 +3766,7 @@ yyscan_t
z_stream
z_streamp
zic_t
+bbsink
+bbsink_ops
+bbsink_state
+bbsink_throttle
--
2.24.3 (Apple Git-128)
On Tue, Sep 21, 2021 at 9:08 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressed file
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to really advance the
bytes written to the output buffer so that we can write into the output buffer.
I don't understand why you think we need to do that. What happens if
you just change prefs->autoFlush = 1 to set it to 0 instead? What I
think will happen is that you'll call LZ4F_compressUpdate a bunch of
times without outputting anything, and then suddenly one of the calls
will produce a bunch of output all at once. But so what? I don't see
that anything in bbsink_lz4_archive_contents() would get broken by
that.
It would be a problem if LZ4F_compressUpdate() didn't produce anything
and also didn't buffer the data internally, and expected us to keep
the input around. That we would have difficulty doing, because we
wouldn't be calling LZ4F_compressUpdate() if we didn't need to free up
some space in that sink's input buffer. But if it buffers the data
internally, I don't know why we care.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 21, 2021 at 9:35 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
Here is a patch for lz4 based on the v5 set of patches. The patch adapts with the
bbsink changes, and is now able to make the provision for the required length
for the output buffer using the new callback function bbsink_lz4_begin_backup().Sample command to take backup:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4Please let me know your thoughts.
This pretty much looks right, with the exception of the autoFlush
thing about which I sent a separate email. I need to write docs for
all of this, and ideally test cases. It might also be good if
pg_basebackup had an option to un-gzip or un-lz4 archives, but I
haven't thought too hard about what would be required to make that
work.
+ if (opt->compression == BACKUP_COMPRESSION_LZ4)
else if
+ /* First of all write the frame header to destination buffer. */
+ Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ CHUNK_SIZE,
+ prefs);
I think this is wrong. I think you should be passing bbs_buffer_length
instead of CHUNK_SIZE, and I think you can just delete CHUNK_SIZE. If
you think otherwise, why?
+ * sink's bbs_buffer of length that can accomodate the compressed input
Spelling.
+ * Make it next multiple of BLCKSZ since the buffer length is expected so.
The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ * If we are falling short of available bytes needed by
+ * LZ4F_compressUpdate() per the upper bound that is decided by
+ * LZ4F_compressBound(), send the archived contents to the next sink to
+ * process it further.
If the number of available bytes has fallen below the value computed
by LZ4F_compressBound(), ask the next sink to process the data so that
we can empty the buffer.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 21, 2021 at 10:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 21, 2021 at 9:08 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressedfile
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to reallyadvance the
bytes written to the output buffer so that we can write into the output
buffer.
I don't understand why you think we need to do that. What happens if
you just change prefs->autoFlush = 1 to set it to 0 instead? What I
think will happen is that you'll call LZ4F_compressUpdate a bunch of
times without outputting anything, and then suddenly one of the calls
will produce a bunch of output all at once. But so what? I don't see
that anything in bbsink_lz4_archive_contents() would get broken by
that.It would be a problem if LZ4F_compressUpdate() didn't produce anything
and also didn't buffer the data internally, and expected us to keep
the input around. That we would have difficulty doing, because we
wouldn't be calling LZ4F_compressUpdate() if we didn't need to free up
some space in that sink's input buffer. But if it buffers the data
internally, I don't know why we care.
If I set prefs->autoFlush to 0, then LZ4F_compressUpdate() returns an
error: ERROR_dstMaxSize_tooSmall after a few iterations.
After digging a bit in the source of LZ4F_compressUpdate() in LZ4
repository, I
see that it throws this error when the destination buffer capacity, which in
our case is mysink->base.bbs_next->bbs_buffer_length is less than the
compress bound which it calculates internally by calling
LZ4F_compressBound()
internally for buffered_bytes + input buffer(CHUNK_SIZE in this case). Not
sure
how can we control this.
Regards,
Jeevan Ladhe
On Tue, Sep 21, 2021 at 10:50 PM Robert Haas <robertmhaas@gmail.com> wrote:
+ if (opt->compression == BACKUP_COMPRESSION_LZ4)
else if
+ /* First of all write the frame header to destination buffer. */ + Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX); + headerSize = LZ4F_compressBegin(mysink->ctx, + mysink->base.bbs_next->bbs_buffer, + CHUNK_SIZE, + prefs);I think this is wrong. I think you should be passing bbs_buffer_length
instead of CHUNK_SIZE, and I think you can just delete CHUNK_SIZE. If
you think otherwise, why?+ * sink's bbs_buffer of length that can accomodate the compressed input
Spelling.
+ * Make it next multiple of BLCKSZ since the buffer length is expected so.
The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ * If we are falling short of available bytes needed by + * LZ4F_compressUpdate() per the upper bound that is decided by + * LZ4F_compressBound(), send the archived contents to the next sink to + * process it further.If the number of available bytes has fallen below the value computed
by LZ4F_compressBound(), ask the next sink to process the data so that
we can empty the buffer.
Thanks for your comments, Robert.
Here is the patch addressing the comments, except the one regarding the
autoFlush flag setting.
Kindly have a look.
Regards,
Jeevan Ladhe
Attachments:
lz4_compress_v3.patchapplication/octet-stream; name=lz4_compress_v3.patchDownload
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d6df3fdeb2..64641903bf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -303,6 +304,8 @@ perform_base_backup(basebackup_options *opt)
/* Set up server-side compression, if client requested it */
if (opt->compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt->compression_level);
+ else if (opt->compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -936,6 +939,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..508a5803a2
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,301 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to
+ * the lz4 compression. Defined as 8k, since the input buffer is multiple of
+ * BLCKSZ i.e. multiple of 8k.
+ */
+#define CHUNK_SIZE 8192
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+ size_t output_buffer_bound;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t next_buf_len;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Remember the compressed buffer bound needed for input buffer to avoid
+ * recomputation in bbsink_lz4_archive_contents().
+ */
+ mysink->output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ next_buf_len = mysink->base.bbs_buffer_length + mysink->output_buffer_bound;
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ next_buf_len = next_buf_len + BLCKSZ - (next_buf_len % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, next_buf_len);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+ size_t headerSize;
+
+ /* Initialize compressor object. */
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->frameInfo.blockMode = LZ4F_blockLinked;
+ prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum;
+ prefs->frameInfo.frameType = LZ4F_frame;
+ prefs->frameInfo.contentSize = 0;
+ prefs->frameInfo.dictID = 0;
+ prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum;
+ prefs->compressionLevel = 0;
+
+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;
+
+ prefs->favorDecSpeed = 0;
+ prefs->reserved[0] = 0;
+ prefs->reserved[1] = 0;
+ prefs->reserved[2] = 0;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ uint8 *next_in = (uint8 *) mysink->base.bbs_buffer;
+
+ while (avail_in > 0)
+ {
+ size_t compressedSize;
+ int nextChunkLen = CHUNK_SIZE;
+
+ /* Last chunk to be read from the input. */
+ if (avail_in < CHUNK_SIZE)
+ nextChunkLen = avail_in;
+
+ /*
+ * Read the nextChunkLen size of data from the input buffer and write the
+ * output data into unused portion of output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ next_in,
+ nextChunkLen,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Advance the input start since we already read some data. */
+ next_in = (uint8 *) next_in + nextChunkLen;
+ avail_in = avail_in - nextChunkLen;
+
+ /*
+ * If the number of available bytes has fallen below the value computed
+ * by LZ4F_compressBound(), ask the next sink to process the data so
+ * that we can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length -
+ mysink->bytes_written) < mysink->output_buffer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * Finalize the lz4 frame and then get that forwarded to the successor sink
+ * as archive content. Then, we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index f09aecb53b..84dc305d56 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -264,6 +264,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
On Wed, Sep 22, 2021 at 12:41 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
If I set prefs->autoFlush to 0, then LZ4F_compressUpdate() returns an
error: ERROR_dstMaxSize_tooSmall after a few iterations.After digging a bit in the source of LZ4F_compressUpdate() in LZ4 repository, I
see that it throws this error when the destination buffer capacity, which in
our case is mysink->base.bbs_next->bbs_buffer_length is less than the
compress bound which it calculates internally by calling LZ4F_compressBound()
internally for buffered_bytes + input buffer(CHUNK_SIZE in this case). Not sure
how can we control this.
Uggh. It had been my guess was that the reason why
LZ4F_compressBound() was returning such a large value was because it
had to allow for the possibility of bytes inside of its internal
buffers. But, if the amount of internally buffered data counts against
the argument that you have to pass to LZ4F_compressBound(), then that
makes it more complicated.
Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:
https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.
--
Robert Haas
EDB: http://www.enterprisedb.com
Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.
Thanks, Robert. I have seen this example, and it is similar to what we have.
I went through each of the steps and appears that I have done it correctly.
I am still trying to debug and figure out where it is going wrong.
I am going to try hooking the pg_basebackup with the lz4 source and
debug both the sources.
Regards,
Jeevan Ladhe
Hi Robert,
I have fixed the autoFlush issue. Basically, I was wrongly initializing
the lz4 preferences in bbsink_lz4_begin_archive() instead of
bbsink_lz4_begin_backup(). I have fixed the issue in the attached
patch, please have a look at it.
Regards,
Jeevan Ladhe
On Fri, Sep 24, 2021 at 6:27 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
wrote:
Show quoted text
Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:
https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.Thanks, Robert. I have seen this example, and it is similar to what we
have.
I went through each of the steps and appears that I have done it correctly.
I am still trying to debug and figure out where it is going wrong.I am going to try hooking the pg_basebackup with the lz4 source and
debug both the sources.Regards,
Jeevan Ladhe
Attachments:
lz4_compress_v4.patchapplication/octet-stream; name=lz4_compress_v4.patchDownload
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d6df3fdeb2..64641903bf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -303,6 +304,8 @@ perform_base_backup(basebackup_options *opt)
/* Set up server-side compression, if client requested it */
if (opt->compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt->compression_level);
+ else if (opt->compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -936,6 +939,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..85f51fea4d
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,291 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to
+ * the lz4 compression. Defined as 8k, since the input buffer is multiple of
+ * BLCKSZ i.e. multiple of 8k.
+ */
+#define CHUNK_SIZE 8192
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+ size_t output_buffer_bound;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t next_buf_len;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->frameInfo.blockMode = LZ4F_blockLinked;
+ prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum;
+ prefs->frameInfo.frameType = LZ4F_frame;
+ prefs->frameInfo.contentSize = 0;
+ prefs->frameInfo.dictID = 0;
+ prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum;
+ prefs->compressionLevel = 0;
+ prefs->autoFlush = 0;
+ prefs->favorDecSpeed = 0;
+ prefs->reserved[0] = 0;
+ prefs->reserved[1] = 0;
+ prefs->reserved[2] = 0;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Remember the compressed buffer bound needed for input buffer to avoid
+ * recomputation in bbsink_lz4_archive_contents().
+ */
+ mysink->output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ next_buf_len = mysink->base.bbs_buffer_length + mysink->output_buffer_bound;
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ next_buf_len = next_buf_len + BLCKSZ - (next_buf_len % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, next_buf_len);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ uint8 *next_in = (uint8 *) mysink->base.bbs_buffer;
+
+ while (avail_in > 0)
+ {
+ size_t compressedSize;
+ int nextChunkLen = CHUNK_SIZE;
+
+ /* Last chunk to be read from the input. */
+ if (avail_in < CHUNK_SIZE)
+ nextChunkLen = avail_in;
+
+ /*
+ * Read the nextChunkLen size of data from the input buffer and write the
+ * output data into unused portion of output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ next_in,
+ nextChunkLen,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Advance the input start since we already read some data. */
+ next_in = (uint8 *) next_in + nextChunkLen;
+ avail_in = avail_in - nextChunkLen;
+
+ /*
+ * If the number of available bytes has fallen below the value computed
+ * by LZ4F_compressBound(), ask the next sink to process the data so
+ * that we can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length -
+ mysink->bytes_written) < mysink->output_buffer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * Finalize the lz4 frame and then get that forwarded to the successor sink
+ * as archive content. Then, we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index f09aecb53b..84dc305d56 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -264,6 +264,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
Hi Robert,
I think the patch v6-0007-Support-base-backup-targets.patch has broken
the case for multiple tablespaces. When I tried to take the backup
for target 'none' and extract the base.tar I was not able to locate
tablespace_map file.
I debugged and figured out in normal tar backup i.e. '-Ft' case
pg_basebackup command is sent with TABLESPACE_MAP to the server:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,
TABLESPACE_MAP, MANIFEST 'yes', TARGET 'client')
But, with the target command i.e. "pg_basebackup -t server:/tmp/data_v1
-Xnone", we are not sending the TABLESPACE_MAP, here is how the command
is sent:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TARGET 'server', TARGET_DETAIL '/tmp/data_none')
I am attaching a patch to fix this issue.
With the patch the command sent is now:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TABLESPACE_MAP, TARGET 'server', TARGET_DETAIL '/tmp/data_none')
Regards,
Jeevan Ladhe
On Tue, Sep 21, 2021 at 10:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Tue, Sep 21, 2021 at 7:54 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:I was wondering if we should change the bbs_buffer_length in bbsink to
be size_t instead of int, because that's what most of the compression
libraries have their length variables defined as.I looked into this and found that I was already using size_t or Size
in a bunch of related places, so this seems to make sense.Here's a new patch set, responding also to Sergei's comments.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
fix_missing_tablespace_map_issue.patchapplication/octet-stream; name=fix_missing_tablespace_map_issue.patchDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 79d0e5cb9d..239269df08 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1738,6 +1738,8 @@ BaseBackup(void)
exit(1);
}
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
if ((colon = strchr(backup_target, ':')) == NULL)
{
AppendStringCommandOption(&buf, use_new_option_syntax,
On Thu, Oct 7, 2021 at 7:50 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
I think the patch v6-0007-Support-base-backup-targets.patch has broken
the case for multiple tablespaces. When I tried to take the backup
for target 'none' and extract the base.tar I was not able to locate
tablespace_map file.I debugged and figured out in normal tar backup i.e. '-Ft' case
pg_basebackup command is sent with TABLESPACE_MAP to the server:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,
TABLESPACE_MAP, MANIFEST 'yes', TARGET 'client')But, with the target command i.e. "pg_basebackup -t server:/tmp/data_v1
-Xnone", we are not sending the TABLESPACE_MAP, here is how the command
is sent:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TARGET 'server', TARGET_DETAIL '/tmp/data_none')I am attaching a patch to fix this issue.
Thanks. Here's a new patch set incorporating that change. I committed
the preparatory patches to add an extensible options syntax for
CREATE_REPLICATION_SLOT and BASE_BACKUP, so those patches are no
longer included in this patch set. Barring objections, I will also
push 0001, a small preparatory refactoring patch, soon.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v7-0001-Refactor-basebackup.c-s-_tarWriteDir-function.patchapplication/octet-stream; name=v7-0001-Refactor-basebackup.c-s-_tarWriteDir-function.patchDownload
From 0108407c93a2331b2d03fb1aa97c0a9a6cba1b5e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 1 May 2020 14:36:57 -0400
Subject: [PATCH v7 1/6] Refactor basebackup.c's _tarWriteDir() function.
Sometimes, we replace a symbolic link that we find in the data
directory with an actual directory within the tarfile that we
create. _tarWriteDir was responsible both for making this
substitution and also for writing the tar header for the
resulting directory into the tar file. Make it do only the first
of those things, and rename to convert_link_to_directory.
Substantially larger refactoring of this source file is planned,
but this little bit seemed to make sense to commit
independently.
---
src/backend/replication/basebackup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 4c97ab7b5a..b31c36d918 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -71,8 +71,7 @@ static void sendFileWithContent(const char *filename, const char *content,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
-static int64 _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly);
+static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void send_int8_string(StringInfoData *buf, int64 intval);
static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
@@ -1381,7 +1380,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
excludeFound = true;
break;
}
@@ -1397,7 +1398,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (statrelpath != NULL && strcmp(pathbuf, statrelpath) == 0)
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
continue;
}
@@ -1409,7 +1412,9 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (strcmp(pathbuf, "./pg_wal") == 0)
{
/* If pg_wal is a symlink, write it as a directory anyway */
- size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
+ convert_link_to_directory(pathbuf, &statbuf);
+ size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
@@ -1883,12 +1888,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
}
/*
- * Write tar header for a directory. If the entry in statbuf is a link then
- * write it as a directory anyway.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
-static int64
-_tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
- bool sizeonly)
+static void
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
/* If symlink, write it as a directory anyway */
#ifndef WIN32
@@ -1897,8 +1901,6 @@ _tarWriteDir(const char *pathbuf, int basepathlen, struct stat *statbuf,
if (pgwin32_is_junction(pathbuf))
#endif
statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-
- return _tarWriteHeader(pathbuf + basepathlen + 1, NULL, statbuf, sizeonly);
}
/*
--
2.24.3 (Apple Git-128)
v7-0004-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v7-0004-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From 1fd8a55fe442c870a348919d653051a530bed04b Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 9 Sep 2021 14:53:04 -0400
Subject: [PATCH v7 4/6] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to suppor the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 62 ++-
src/backend/replication/basebackup_copy.c | 266 ++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
src/tools/pgindent/typedefs.list | 3 +
5 files changed, 722 insertions(+), 53 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0cd118f1f1..7fb7b1cf66 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -81,6 +88,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -233,7 +241,7 @@ perform_base_backup(basebackup_options *opt)
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- bbsink *sink = bbsink_copytblspc_new();
+ bbsink *sink;
bbsink *progress_sink;
/* Initial backup state, insofar as we know it now. */
@@ -243,6 +251,16 @@ perform_base_backup(basebackup_options *opt)
state.bytes_total = 0;
state.bytes_total_is_valid = false;
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -383,7 +401,10 @@ perform_base_backup(basebackup_options *opt)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(progress_sink);
@@ -621,6 +642,7 @@ perform_base_backup(basebackup_options *opt)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -688,8 +710,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -830,6 +854,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -1682,6 +1722,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 564f010188..389a520417 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,51 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -37,6 +101,17 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -48,6 +123,193 @@ const bbsink_ops bbsink_copytblspc_ops = {
.end_backup = bbsink_copytblspc_end_backup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins
+ * with a the type byte we're going to need, and then arrange things so
+ * that the data we're given will be written just after that type byte.
+ * That will allow us to ship the data with a single call to pq_putmessage
+ * and without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 67d01d8b6e..0a9eb8ca7e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -167,6 +177,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -978,10 +995,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1008,8 +1026,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1049,16 +1067,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1069,8 +1087,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1080,6 +1098,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1333,28 +1662,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1477,46 +1810,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 41c9c367f7..31a6d2251c 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,6 +261,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bd08cab6f1..5838af73bb 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3767,7 +3767,10 @@ yyscan_t
z_stream
z_streamp
zic_t
+ArchiveStreamState
+backup_target_type
bbsink
+bbsink_copystream
bbsink_ops
bbsink_state
bbsink_throttle
--
2.24.3 (Apple Git-128)
v7-0005-Support-base-backup-targets.patchapplication/octet-stream; name=v7-0005-Support-base-backup-targets.patchDownload
From 1334dc2dbb2bd9341dc2a2658435deb3d6f845bc Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 1 Jul 2021 14:56:52 -0400
Subject: [PATCH v7 5/6] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
Patch by me, with a bug fix by Jeevan Ladhe.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 301 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 199 +++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 558 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 7fb7b1cf66..ed16c6861f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -253,14 +256,38 @@ perform_base_backup(basebackup_options *opt)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt->target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt->target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt->target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt->target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt->target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
@@ -711,6 +738,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -856,25 +885,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -886,6 +925,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 389a520417..9104455700 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -127,11 +130,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -204,8 +208,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -286,8 +294,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..dff930c3c9
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,301 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index 1606463291..d1927e4f81 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -121,7 +121,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index ef7e6bfb77..a910915ccd 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0a9eb8ca7e..f5d5d918a2 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -109,7 +109,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -126,6 +126,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -357,6 +358,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1216,15 +1219,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1296,24 +1306,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1684,7 +1702,35 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1779,8 +1825,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1791,7 +1842,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1874,7 +1926,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2008,8 +2060,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2031,7 +2086,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2065,6 +2120,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2115,7 +2171,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2156,6 +2212,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2288,18 +2347,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2309,6 +2400,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2325,6 +2426,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2358,8 +2462,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2380,6 +2494,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2387,6 +2502,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2396,6 +2514,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2438,11 +2559,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 31a6d2251c..7365b39e23 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -261,9 +261,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
TimeLineID endtli);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 6007827b44..6af924b6d4 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5838af73bb..abb66d4494 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3772,6 +3772,7 @@ backup_target_type
bbsink
bbsink_copystream
bbsink_ops
+bbsink_server
bbsink_state
bbsink_throttle
bbstreamer
--
2.24.3 (Apple Git-128)
v7-0006-WIP-Server-side-gzip-compression.patchapplication/octet-stream; name=v7-0006-WIP-Server-side-gzip-compression.patchDownload
From 1324cb299942853d6ef0a74d00aefba105ce9a95 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 13 Sep 2021 12:07:01 -0400
Subject: [PATCH v7 6/6] WIP: Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
---
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 303 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 38 ++-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 382 insertions(+), 2 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ed16c6861f..61c76160d1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -292,6 +300,10 @@ perform_base_backup(basebackup_options *opt)
if (opt->maxrate > 0)
sink = bbsink_throttle_new(sink, opt->maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt->compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt->compression_level);
+
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -740,11 +752,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -914,6 +928,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..3d2fa93e55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,303 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index f5d5d918a2..e87c2b7a4d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -987,7 +988,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -996,14 +999,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1734,6 +1755,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2144,6 +2176,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2323,6 +2356,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 7365b39e23..10a316cacd 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -263,6 +263,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v7-0003-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v7-0003-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload
From 9642a68be5a255568834b6eefe91b39441d506d8 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 12:00:34 -0400
Subject: [PATCH v7 3/6] Introduce 'bbstreamer' abstraction to modularize
pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 +++++
src/bin/pg_basebackup/bbstreamer_file.c | 579 ++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 912 +++++-----------------
src/tools/pgindent/typedefs.list | 10 +
7 files changed, 1697 insertions(+), 727 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..8fda09dcd4 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -34,10 +34,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -60,7 +66,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..b24dc848c1
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content) (bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize) (bbstreamer *streamer);
+ void (*free) (bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..03e1ea2550
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,579 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include <unistd.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map) (const char *);
+ void (*report_output_file) (const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any
+ * symbolic link, and which should return a replacement pathname to be used
+ * in its place. If NULL, the symbolic link target is used without
+ * modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a
+ * new output file. The pathname to that file is passed as an argument. If
+ * NULL, the call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = pstrdup(basepath);
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6 clusters) will
+ * have been created by the wal receiver process. Also, when the WAL
+ * directory location was specified, pg_wal (or pg_xlog) has already
+ * been created as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ pfree(mystreamer->basepath);
+ pfree(mystreamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..4d15251fdc
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf; on
+ * older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..5a9f587dca
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,444 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+
+ /*
+ * If we're expecting an archive member header, accumulate a
+ * full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the file
+ * trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not the
+ * start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 27ee6394cf..67d01d8b6e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -28,18 +28,13 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
-#include "common/string.h"
#include "fe_utils/option_utils.h"
#include "fe_utils/recovery_gen.h"
-#include "fe_utils/string_utils.h"
#include "getopt_long.h"
-#include "libpq-fe.h"
-#include "pgtar.h"
-#include "pgtime.h"
-#include "pqexpbuffer.h"
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
@@ -62,34 +57,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -161,10 +131,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -190,14 +161,15 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force,
- bool finished);
-
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force, bool finished);
+
+static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported);
+static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -360,21 +332,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -763,6 +720,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -775,8 +740,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* is moved to the next line.
*/
static void
-progress_report(int tablespacenum, const char *filename,
- bool force, bool finished)
+progress_report(int tablespacenum, bool force, bool finished)
{
int percent;
char totaldone_str[32];
@@ -811,7 +775,7 @@ progress_report(int tablespacenum, const char *filename,
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -827,7 +791,7 @@ progress_report(int tablespacenum, const char *filename,
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -841,7 +805,7 @@ progress_report(int tablespacenum, const char *filename,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -987,257 +951,170 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
}
/*
- * Write a piece of tar data
+ * Figure out what to do with an archive received from the server based on
+ * the options selected by the user. We may just write the results directly
+ * to a file, or we might compress first, or we might extract the tar file
+ * and write each member separately. This function doesn't do any of that
+ * directly, but it works out what kind of bbstreamer we need to create so
+ * that the right stuff happens when, down the road, we actually receive
+ * the data.
*/
-static void
-writeTarData(WriteTarState *state, char *buf, int r)
+static bbstreamer *
+CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported)
{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-}
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer = NULL;
+ bool inject_manifest;
+ bool must_parse_archive;
-/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
- *
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
- */
-static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- char zerobuf[TAR_BLOCK_SIZE * 2];
- WriteTarState state;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
+ /*
+ * Normally, we emit the backup manifest as a separate file, but when
+ * we're writing a tarfile to stdout, we don't have that option, so
+ * include it in the one tarfile we've got.
+ */
+ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ /*
+ * We have to parse the archive if (1) we're suppose to extract it, or if
+ * (2) we need to inject backup_manifest or recovery configuration into it.
+ */
+ must_parse_archive = (format == 'p' || inject_manifest ||
+ (spclocation == NULL && writerecoveryconf));
- if (state.basetablespace)
+ if (format == 'p')
{
+ const char *directory;
+
/*
- * Base tablespaces
+ * In plain format, we must extract the archive. The data for the main
+ * tablespace will be written to the base directory, and the data for
+ * other tablespaces will be written to the directory where they're
+ * located on the server, after applying any user-specified tablespace
+ * mappings.
*/
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
-
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
- else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ directory = spclocation == NULL ? basedir
+ : get_tablespace_mapping(spclocation);
+ streamer = bbstreamer_extractor_new(directory,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
+ FILE *archive_file;
+ char archive_filename[MAXPGPATH];
+
/*
- * Specific tablespace
+ * In tar format, we just write the archive without extracting it.
+ * Normally, we write it to the archive name provided by the caller,
+ * but when the base directory is "-" that means we need to write
+ * to standard output.
*/
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(archive_filename, sizeof(archive_filename), "-");
+ archive_file = stdout;
}
else
-#endif
{
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
+ snprintf(archive_filename, sizeof(archive_filename),
+ "%s/%s", basedir, archive_name);
+ archive_file = NULL;
}
- }
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(archive_filename, ".gz", sizeof(archive_filename));
+ streamer = bbstreamer_gzip_writer_new(archive_filename,
+ archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+
+ /*
+ * If we need to parse the archive for whatever reason, then we'll
+ * also need to re-archive, because, if the output format is tar, the
+ * only point of parsing the archive is to be able to inject stuff
+ * into it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = archive_filename;
+ }
/*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
+ * If we're supposed to inject the backup manifest into the results,
+ * it should be done here, so that the file content can be injected
+ * directly, without worrying about the details of the tar format.
*/
+ if (inject_manifest)
+ manifest_inject_streamer = streamer;
- MemSet(zerobuf, 0, sizeof(zerobuf));
-
- if (state.basetablespace && writerecoveryconf)
+ /*
+ * If this is the main tablespace and we're supposed to write
+ * recovery information, arrange to do that.
+ */
+ if (spclocation == NULL && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ Assert(must_parse_archive);
+ streamer = bbstreamer_recovery_injector_new(streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ /*
+ * If we're doing anything that involves understanding the contents of
+ * the archive, we'll need to parse it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_parser_new(streamer);
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ /* Return the results. */
+ *manifest_inject_streamer_p = manifest_inject_streamer;
+ return streamer;
+}
- writeTarData(&state, header, sizeof(header));
+/*
+ * Receive raw tar data from the server, and stream it to the appropriate
+ * location. If we're writing a single tarfile to standard output, also
+ * receive the backup manifest and inject it into that tarfile.
+ */
+static void
+ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum)
+{
+ WriteTarState state;
+ bbstreamer *manifest_inject_streamer;
+ bool is_recovery_guc_supported;
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ /* Pass all COPY data through to the backup streamer. */
+ memset(&state, 0, sizeof(state));
+ is_recovery_guc_supported =
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ state.streamer = CreateBackupStreamer(archive_name, spclocation,
+ &manifest_inject_streamer,
+ is_recovery_guc_supported);
+ state.tablespacenum = tablespacenum;
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ progress_filename = NULL;
/*
- * Normally, we emit the backup manifest as a separate file, but when
- * we're writing a tarfile to stdout, we don't have that option, so
- * include it in the one tarfile we've got.
+ * The decision as to whether we need to inject the backup manifest into
+ * the output at this stage is made by CreateBackupStreamer; if that is
+ * needed, manifest_inject_streamer will be non-NULL; otherwise, it will
+ * be NULL.
*/
- if (strcmp(basedir, "-") == 0 && manifest)
+ if (manifest_inject_streamer != NULL)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
+ /* Slurp the entire backup manifest into a buffer. */
initPQExpBuffer(&buf);
ReceiveBackupManifestInMemory(conn, &buf);
if (PQExpBufferDataBroken(buf))
@@ -1245,42 +1122,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
- termPQExpBuffer(&buf);
- }
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
+ /* Inject it into the output tarfile. */
+ bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ buf.data, buf.len);
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
+ /* Free memory. */
+ termPQExpBuffer(&buf);
}
- progress_report(rownum, state.filename, true, false);
+ /* Cleanup. */
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+
+ progress_report(tablespacenum, true, false);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1296,184 +1151,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
-
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
+ bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
+ progress_report(state->tablespacenum, false, false);
}
@@ -1498,242 +1179,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true, false);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2032,16 +1477,32 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
+ /* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named base.tar
+ * if it's the main data directory or <tablespaceoid>.tar if it's for
+ * another tablespace. CreateBackupStreamer() will arrange to add .gz
+ * to the archive name if pg_basebackup is performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
+ }
/*
* Now receive backup manifest, if appropriate.
@@ -2057,7 +1518,10 @@ BaseBackup(void)
ReceiveBackupManifest(conn);
if (showprogress)
- progress_report(PQntuples(res), NULL, true, true);
+ {
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true, true);
+ }
PQclear(res);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 20be33a79d..bd08cab6f1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3771,3 +3771,13 @@ bbsink
bbsink_ops
bbsink_state
bbsink_throttle
+bbstreamer
+bbstreamer
+bbstreamer_archive_context
+bbstreamer_bzip_writer
+bbstreamer_member
+bbstreamer_ops
+bbstreamer_plain_writer
+bbstreamer_recovery_injector
+bbstreamer_tar_archiver
+bbstreamer_tar_parser
--
2.24.3 (Apple Git-128)
v7-0002-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v7-0002-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload
From 45139759c90fed14ad285bd3341f342e43f01ec2 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 30 Jun 2021 11:45:50 -0400
Subject: [PATCH v7 2/6] Introduce 'bbsink' abstraction to modularize base
backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc', but in the future we might introduce
other options.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
---
src/backend/replication/Makefile | 4 +
src/backend/replication/backup_manifest.c | 28 +-
src/backend/replication/basebackup.c | 674 +++++-------------
src/backend/replication/basebackup_copy.c | 324 +++++++++
src/backend/replication/basebackup_progress.c | 250 +++++++
src/backend/replication/basebackup_sink.c | 115 +++
src/backend/replication/basebackup_throttle.c | 198 +++++
src/include/replication/backup_manifest.h | 5 +-
src/include/replication/basebackup_sink.h | 275 +++++++
src/tools/pgindent/typedefs.list | 4 +
10 files changed, 1363 insertions(+), 514 deletions(-)
create mode 100644 src/backend/replication/basebackup_copy.c
create mode 100644 src/backend/replication/basebackup_progress.c
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/backend/replication/basebackup_throttle.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..74b97cf126 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,10 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_copy.o \
+ basebackup_progress.o \
+ basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 04ca455ace..4fe11a3b5c 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -310,9 +311,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -352,38 +352,28 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
- *
- * We choose to read back the data from the temporary file in chunks of
- * size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
- * size, so it seems to make sense to match that value here.
+ * Send the backup manifest.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
- char manifestbuf[BLCKSZ];
size_t bytes_to_read;
size_t rc;
- bytes_to_read = Min(sizeof(manifestbuf),
+ bytes_to_read = Min(sink->bbs_buffer_length,
manifest->manifest_size - manifest_bytes_done);
- rc = BufFileRead(manifest->buffile, manifestbuf, bytes_to_read);
+ rc = BufFileRead(manifest->buffile, sink->bbs_buffer,
+ bytes_to_read);
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b31c36d918..0cd118f1f1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -46,6 +43,16 @@
#include "utils/resowner.h"
#include "utils/timestamp.h"
+/*
+ * How much data do we want to send in one CopyData message? Note that
+ * this may also result in reading the underlying files in chunks of this
+ * size.
+ *
+ * NB: The buffer size is required to be a multiple of the system block
+ * size, so use that value instead if it's bigger than our preference.
+ */
+#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+
typedef struct
{
const char *label;
@@ -59,27 +66,25 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
+static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
static void perform_base_backup(basebackup_options *opt);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -90,46 +95,12 @@ static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
-/*
- * Size of each block sent into the tar stream for larger files.
- */
-#define TAR_SEND_SIZE 32768
-
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
-/* The starting XLOG position of the base backup. */
-static XLogRecPtr startptr;
-
/* Total number of checksum failures during base backup. */
static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -255,30 +226,29 @@ static const struct exclude_list_item noChecksumFiles[] = {
static void
perform_base_backup(basebackup_options *opt)
{
- TimeLineID starttli;
+ bbsink_state state;
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- List *tablespaces = NIL;
+ bbsink *sink = bbsink_copytblspc_new();
+ bbsink *progress_sink;
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ /* Initial backup state, insofar as we know it now. */
+ state.tablespaces = NIL;
+ state.tablespace_num = 0;
+ state.bytes_done = 0;
+ state.bytes_total = 0;
+ state.bytes_total_is_valid = false;
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Set up network throttling, if client requested it */
+ if (opt->maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt->maxrate);
+
+ /* Set up progress reporting. */
+ sink = progress_sink = bbsink_progress_new(sink, opt->progress);
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -295,11 +265,11 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
- startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
- tblspc_map_file);
+ basebackup_progress_wait_checkpoint();
+ state.startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint,
+ &state.starttli,
+ labelfile, &state.tablespaces,
+ tblspc_map_file);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -312,7 +282,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -329,7 +298,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = -1;
- tablespaces = lappend(tablespaces, ti);
+ state.tablespaces = lappend(state.tablespaces, ti);
/*
* Calculate the total backup size by summing up the size of each
@@ -337,100 +306,53 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
- NULL);
+ tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
+ true, NULL, NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
+ state.bytes_total += tmp->size;
}
+ state.bytes_total_is_valid = true;
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
-
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, &state, SINK_BUFFER_LENGTH);
/* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
+ sendDir(sink, ".", 1, false, state.tablespaces,
+ sendtblspclinks, &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -438,32 +360,33 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ Assert(lnext(state.tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
+ bbsink_end_archive(sink);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(progress_sink);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -489,8 +412,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -501,7 +423,7 @@ perform_base_backup(basebackup_options *opt)
* shouldn't be such files, but if there are, there's little harm in
* including them.
*/
- XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLByteToSeg(state.startptr, startsegno, wal_segment_size);
XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
@@ -591,7 +513,6 @@ perform_base_backup(basebackup_options *opt)
{
char *walFileName = (char *) lfirst(lc);
int fd;
- char buf[TAR_SEND_SIZE];
size_t cnt;
pgoff_t len = 0;
@@ -630,22 +551,17 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
- while ((cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf),
+ while ((cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length,
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -674,7 +590,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +613,23 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
+ AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
+ endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -739,7 +655,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -961,155 +877,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
perform_base_backup(&opt);
}
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
-
- pq_endmessage(&buf);
- }
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
-
- /*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
- */
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", LSN_FORMAT_ARGS(ptr));
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
+ int bytes_done = 0,
len;
pg_checksum_context checksum_ctx;
@@ -1135,25 +911,23 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
- update_basebackup_progress(len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
+ elog(ERROR, "could not update checksum of file \"%s\"",
+ filename);
+
+ while (bytes_done < len)
{
- char buf[TAR_BLOCK_SIZE];
+ size_t remaining = len - bytes_done;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
+ memcpy(sink->bbs_buffer, content, nbytes);
+ bbsink_archive_contents(sink, nbytes);
+ bytes_done += nbytes;
}
- if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
- elog(ERROR, "could not update checksum of file \"%s\"",
- filename);
+ _tarWritePadding(sink, len);
AddFileToBackupManifest(manifest, NULL, filename, len,
(pg_time_t) statbuf.st_mtime, &checksum_ctx);
@@ -1167,7 +941,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1197,11 +971,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1220,8 +994,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1381,8 +1155,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1399,8 +1173,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1413,15 +1187,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1452,7 +1226,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1476,7 +1250,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1508,7 +1282,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1516,7 +1290,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1593,21 +1367,19 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
bool block_retry = false;
- char buf[TAR_SEND_SIZE];
uint16 checksum;
int checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
@@ -1628,7 +1400,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1669,9 +1441,11 @@ sendFile(const char *readfilename, const char *tarfilename,
*/
while (len < statbuf->st_size)
{
+ size_t remaining = statbuf->st_size - len;
+
/* Try to read some more data. */
- cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf), statbuf->st_size - len),
+ cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length, remaining),
len, readfilename, true);
/*
@@ -1688,7 +1462,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read a multiple of
* BLCKSZ bytes.
*/
- Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
if (verify_checksum && (cnt % BLCKSZ != 0))
{
@@ -1704,7 +1478,7 @@ sendFile(const char *readfilename, const char *tarfilename,
{
for (i = 0; i < cnt / BLCKSZ; i++)
{
- page = buf + BLCKSZ * i;
+ page = sink->bbs_buffer + BLCKSZ * i;
/*
* Only check pages which have not been modified since the
@@ -1714,7 +1488,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* this case. We also skip completely new pages, since they
* don't have a checksum yet.
*/
- if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+ if (!PageIsNew(page) && PageGetLSN(page) < sink->bbs_state->startptr)
{
checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
phdr = (PageHeader) page;
@@ -1736,7 +1510,8 @@ sendFile(const char *readfilename, const char *tarfilename,
/* Reread the failed block */
reread_cnt =
- basebackup_read_file(fd, buf + BLCKSZ * i,
+ basebackup_read_file(fd,
+ sink->bbs_buffer + BLCKSZ * i,
BLCKSZ, len + BLCKSZ * i,
readfilename,
false);
@@ -1783,34 +1558,29 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
/* Also feed it to the checksum machinery. */
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer, cnt) < 0)
elog(ERROR, "could not update checksum of base backup");
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
- if (len < statbuf->st_size)
+ while (len < statbuf->st_size)
{
- MemSet(buf, 0, sizeof(buf));
- while (len < statbuf->st_size)
- {
- cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
- elog(ERROR, "could not update checksum of base backup");
- update_basebackup_progress(cnt);
- len += cnt;
- throttle(cnt);
- }
+ size_t remaining = statbuf->st_size - len;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
+
+ MemSet(sink->bbs_buffer, 0, nbytes);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ nbytes) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ bbsink_archive_contents(sink, nbytes);
+ len += nbytes;
}
/*
@@ -1818,13 +1588,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
- }
+ _tarWritePadding(sink, len);
CloseTransientFile(fd);
@@ -1847,18 +1611,28 @@ sendFile(const char *readfilename, const char *tarfilename,
return true;
}
-
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[TAR_BLOCK_SIZE];
enum tarError rc;
if (!sizeonly)
{
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ /*
+ * As of this writing, the smallest supported block size is 1kB, which
+ * is twice TAR_BLOCK_SIZE. Since the buffer size is required to be a
+ * multiple of BLCKSZ, it should be safe to assume that the buffer is
+ * large enough to fit an entire tar block. We double-check by means of
+ * these assertions.
+ */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= BLCKSZ,
+ "BLCKSZ too small for tar block");
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ rc = tarCreateHeader(sink->bbs_buffer, filename, linktarget,
+ statbuf->st_size, statbuf->st_mode,
+ statbuf->st_uid, statbuf->st_gid,
statbuf->st_mtime);
switch (rc)
@@ -1880,134 +1654,48 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
- update_basebackup_progress(sizeof(h));
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
}
- return sizeof(h);
-}
-
-/*
- * If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
- */
-static void
-convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
-{
- /* If symlink, write it as a directory anyway */
-#ifndef WIN32
- if (S_ISLNK(statbuf->st_mode))
-#else
- if (pgwin32_is_junction(pathbuf))
-#endif
- statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
+ return TAR_BLOCK_SIZE;
}
/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
+ * Pad with zero bytes out to a multiple of TAR_BLOCK_SIZE.
*/
static void
-throttle(size_t increment)
+_tarWritePadding(bbsink *sink, int len)
{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
+ int pad = tarPaddingBytesRequired(len);
/*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
+ * As in _tarWriteHeader, it should be safe to assume that the buffer is
+ * large enough that we don't need to do this in multiple chunks.
*/
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+ Assert(pad <= TAR_BLOCK_SIZE);
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
-
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
+ if (pad > 0)
+ {
+ MemSet(sink->bbs_buffer, 0, pad);
+ bbsink_archive_contents(sink, pad);
}
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
}
/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
static void
-update_basebackup_progress(int64 delta)
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
+ /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+ if (S_ISLNK(statbuf->st_mode))
+#else
+ if (pgwin32_is_junction(pathbuf))
+#endif
+ statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
/*
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
new file mode 100644
index 0000000000..564f010188
--- /dev/null
+++ b/src/backend/replication/basebackup_copy.c
@@ -0,0 +1,324 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_copy.c
+ * send basebackup archives using one COPY OUT operation per
+ * tablespace, and an additional COPY OUT for the backup manifest
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_copy.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_copytblspc_begin_backup(bbsink *sink);
+static void bbsink_copytblspc_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copytblspc_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_archive(bbsink *sink);
+static void bbsink_copytblspc_begin_manifest(bbsink *sink);
+static void bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_manifest(bbsink *sink);
+static void bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendTablespaceList(List *tablespaces);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+
+const bbsink_ops bbsink_copytblspc_ops = {
+ .begin_backup = bbsink_copytblspc_begin_backup,
+ .begin_archive = bbsink_copytblspc_begin_archive,
+ .archive_contents = bbsink_copytblspc_archive_contents,
+ .end_archive = bbsink_copytblspc_end_archive,
+ .begin_manifest = bbsink_copytblspc_begin_manifest,
+ .manifest_contents = bbsink_copytblspc_manifest_contents,
+ .end_manifest = bbsink_copytblspc_end_manifest,
+ .end_backup = bbsink_copytblspc_end_backup
+};
+
+/*
+ * Create a new 'copytblspc' bbsink.
+ */
+bbsink *
+bbsink_copytblspc_new(void)
+{
+ bbsink *sink = palloc0(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_copytblspc_ops;
+
+ return sink;
+}
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_copytblspc_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ /* Create a suitable buffer. */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_copytblspc_archive_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", LSN_FORMAT_ARGS(ptr));
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a result set via libpq describing the tablespace list.
+ */
+static void
+SendTablespaceList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..79f4d9dea3
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress tracking, including but not
+ * limited to command progress reporting.
+ *
+ * This should be used even if the PROGRESS option to the replication
+ * command BASE_BACKUP is not specified. Without that option, we won't
+ * have tallied up the size of the files that are going to need to be
+ * backed up, but we can still report to the command progress reporting
+ * facility how much data we've processed.
+ *
+ * Moreover, we also use this as a convenient place to update certain
+ * fields of the bbsink_state. That work is accurately described as
+ * keeping track of our progress, but it's not just for introspection.
+ * We need those fields to be updated properly in order for base backups
+ * to work.
+ *
+ * This particular basebackup sink requires extra callbacks that most base
+ * backup sinks don't. Rather than cramming those into the interface, we just
+ * have a few extra functions here that basebackup.c can call. (We could put
+ * the logic directly into that file as it's fairly simple, but it seems
+ * cleaner to have everything related to progress reporting in one place.)
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+static void bbsink_progress_begin_backup(bbsink *sink);
+static void bbsink_progress_archive_contents(bbsink *sink, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * Create a new basebackup sink that performs progress tracking functions and
+ * forwards data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink));
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_progress_ops;
+ sink->bbs_next = next;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of the
+ * backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL, -1);
+
+ return sink;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+
+ /*
+ * Report that we are now streaming database files as a base backup. Also
+ * advertise the number of tablespaces, and, if known, the estimated total
+ * backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ if (sink->bbs_state->bytes_total_is_valid)
+ val[1] = sink->bbs_state->bytes_total;
+ else
+ val[1] = -1;
+ val[2] = list_length(sink->bbs_state->tablespaces);
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ bbsink_forward_begin_backup(sink);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ /*
+ * We expect one archive per tablespace, so reaching the end of an archive
+ * also means reaching the end of a tablespace. (Some day we might have a
+ * reason to decouple these concepts.)
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (sink->bbs_state->tablespace_num < list_length(sink->bbs_state->tablespaces))
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ sink->bbs_state->tablespace_num + 1);
+
+ /* Delegate to next sink. */
+ bbsink_forward_end_archive(sink);
+
+ /*
+ * This is a convenient place to update the bbsink_state's notion of which
+ * is the current tablespace. Note that the bbsink_state object is shared
+ * across all bbsink objects involved, but we're the outermost one and
+ * this is the very last thing we do.
+ */
+ sink->bbs_state->tablespace_num++;
+}
+
+/*
+ * Handle progress tracking for new archive contents.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+
+ /* First update bbsink_state with # of bytes done. */
+ state->bytes_done += len;
+
+ /* Now forward to next sink. */
+ bbsink_forward_archive_contents(sink, len);
+
+ /* Prepare to set # of bytes done for command progress reporting. */
+ val[nparam++] = state->bytes_done;
+
+ /*
+ * We may also want to update # of total bytes, to avoid overflowing past
+ * 100% or the full size. This may make the total size number change as we
+ * approach the end of the backup (the estimate will always be wrong if
+ * WAL is included), but that's better than having the done column be
+ * bigger than the total.
+ */
+ if (state->bytes_total_is_valid && state->bytes_done > state->bytes_total)
+ val[nparam++] = state->bytes_done;
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+
+ Assert(sink->bbs_ops == &bbsink_progress_ops);
+ Assert(state->tablespace_num >= list_length(state->tablespaces) - 1);
+ Assert(state->tablespace_num <= list_length(state->tablespaces));
+
+ /*
+ * We report having finished all tablespaces at this point, even if the
+ * archive for the main tablespace is still open, because what's going to
+ * be added is WAL files, not files that are really from the main
+ * tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = list_length(state->tablespaces);
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..14104f50e8
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,115 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+/*
+ * Forward begin_backup callback.
+ *
+ * Only use this implementation if you want the bbsink you're implementing to
+ * share a buffer with the succesor bbsink.
+ */
+void
+bbsink_forward_begin_backup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_state != NULL);
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+ sink->bbs_buffer = sink->bbs_next->bbs_buffer;
+}
+
+/*
+ * Forward begin_archive callback.
+ */
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+/*
+ * Forward archive_contents callback.
+ *
+ * Code that wants to use this should initalize its own bbs_buffer and
+ * bbs_buffer_length fields to the values from the successor sink. In cases
+ * where the buffer isn't shared, the data needs to be copied before forwarding
+ * the callback. We don't do try to do that here, because there's really no
+ * reason to have separately allocated buffers containing the same identical
+ * data.
+ */
+void
+bbsink_forward_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_archive_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_archive callback.
+ */
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * Forward begin_manifest callback.
+ */
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward manifest_contents callback.
+ *
+ * As with the archive_contents callback, it's expected that the buffer is
+ * shared.
+ */
+void
+bbsink_forward_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_manifest callback.
+ */
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward end_backup callback.
+ */
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..1606463291
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,198 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink);
+static void bbsink_throttle_archive_contents(bbsink *sink, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ bbsink_forward_begin_backup(sink);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 099108910c..16ed7eec9b 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,7 +47,8 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
extern void FreeBackupManifest(backup_manifest_info *manifest);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..41c9c367f7
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,275 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * Taking a base backup produces one archive per tablespace directory,
+ * plus a backup manifest unless that feature has been disabled. The
+ * goal of the backup process is to put those archives and that manifest
+ * someplace, possibly after postprocessing them in some way. A 'bbsink'
+ * is an object to which those archives, and the manifest if present,
+ * can be sent.
+ *
+ * In practice, there will be a chain of 'bbsink' objects rather than
+ * just one, with callbacks being forwarded from one to the next,
+ * possibly with modification. Each object is responsible for a
+ * single task e.g. command progress reporting, throttling, or
+ * communication with the client.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Overall backup state shared by all bbsink objects for a backup.
+ *
+ * Before calling bbstate_begin_backup, caller must initiate a bbsink_state
+ * object which will last for the lifetime of the backup, and must thereafter
+ * update it as required before each new call to a bbsink method. The bbsink
+ * will retain a pointer to the state object and will consult it to understand
+ * the progress of the backup.
+ *
+ * 'tablespaces' is a list of tablespaceinfo objects. It must be set before
+ * calling bbstate_begin_backup() and must not be modified thereafter.
+ *
+ * 'tablespace_num' is the index of the current tablespace within the list
+ * stored in 'tablespaces'.
+ *
+ * 'bytes_done' is the number of bytes read so far from $PGDATA.
+ *
+ * 'bytes_total' is the total number of bytes estimated to be present in
+ * $PGDATA, if we have estimated this.
+ *
+ * 'bytes_total_is_valid' is true if and only if a proper estimate has been
+ * stored into 'bytes_total'.
+ *
+ * 'startptr' and 'starttli' identify the point in the WAL stream at which
+ * the backup began. They must be set before calling bbstate_begin_backup()
+ * and must not be modified thereafter.
+ */
+typedef struct bbsink_state
+{
+ List *tablespaces;
+ int tablespace_num;
+ uint64 bytes_done;
+ uint64 bytes_total;
+ bool bytes_total_is_valid;
+ XLogRecPtr startptr;
+ TimeLineID starttli;
+} bbsink_state;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
+ *
+ * 'bbs_next' is a pointer to another bbsink to which this bbsink is
+ * forwarding some or all operations.
+ *
+ * 'bbs_state' is a pointer to the bbsink_state object for this backup.
+ * Every bbsink associated with this backup should point to the same
+ * underlying state object.
+ *
+ * In general it is expected that the values of these fields are set when
+ * a bbsink is created and that they do not change thereafter. It's OK
+ * to modify the data to which bbs_buffer or bbs_state point, but no changes
+ * should be made to the contents of this struct.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ char *bbs_buffer;
+ size_t bbs_buffer_length;
+ bbsink *bbs_next;
+ bbsink_state *bbs_state;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline
+ * functions rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /*
+ * This callback is invoked just once, at the very start of the backup.
+ * It must set bbs_buffer to point to a chunk of storage where at least
+ * bbs_buffer_length bytes of data can be written.
+ */
+ void (*begin_backup) (bbsink *sink);
+
+ /*
+ * For each archive transmitted to a bbsink, there will be one call to the
+ * begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ *
+ * Before invoking the archive_contents() callback, the caller should copy
+ * a number of bytes equal to what will be passed as len into bbs_buffer,
+ * but not more than bbs_buffer_length.
+ *
+ * It's generally good if the buffer is as full as possible before the
+ * archive_contents() callback is invoked, but it's not worth expending
+ * extra cycles to make sure it's absolutely 100% full.
+ */
+ void (*begin_archive) (bbsink *sink, const char *archive_name);
+ void (*archive_contents) (bbsink *sink, size_t len);
+ void (*end_archive) (bbsink *sink);
+
+ /*
+ * If a backup manifest is to be transmitted to a bbsink, there will be
+ * one call to the begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback. These calls will occur after all archives are transmitted.
+ *
+ * The rules for invoking the manifest_contents() callback are the same as
+ * for the archive_contents() callback above.
+ */
+ void (*begin_manifest) (bbsink *sink);
+ void (*manifest_contents) (bbsink *sink, size_t len);
+ void (*end_manifest) (bbsink *sink);
+
+ /* This callback is invoked just once, at the very end of the backup. */
+ void (*end_backup) (bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, bbsink_state *state, int buffer_length)
+{
+ Assert(sink != NULL);
+
+ Assert(buffer_length > 0);
+
+ sink->bbs_state = state;
+ sink->bbs_buffer_length = buffer_length;
+ sink->bbs_ops->begin_backup(sink);
+
+ Assert(sink->bbs_buffer != NULL);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /*
+ * The caller should make a reasonable attempt to fill the buffer before
+ * calling this function, so it shouldn't be completely empty. Nor should
+ * it be filled beyond capacity.
+ */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->archive_contents(sink, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /* See comments in bbsink_archive_contents. */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->manifest_contents(sink, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ Assert(sink->bbs_state->tablespace_num == list_length(sink->bbs_state->tablespaces));
+
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cb5b5ec74c..20be33a79d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3767,3 +3767,7 @@ yyscan_t
z_stream
z_streamp
zic_t
+bbsink
+bbsink_ops
+bbsink_state
+bbsink_throttle
--
2.24.3 (Apple Git-128)
On Tue, Oct 5, 2021 at 5:51 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
I have fixed the autoFlush issue. Basically, I was wrongly initializing
the lz4 preferences in bbsink_lz4_begin_archive() instead of
bbsink_lz4_begin_backup(). I have fixed the issue in the attached
patch, please have a look at it.
Thanks for the new patch. Seems like this is getting closer, but:
+/*
+ * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to
+ * the lz4 compression. Defined as 8k, since the input buffer is multiple of
+ * BLCKSZ i.e. multiple of 8k.
+ */
+#define CHUNK_SIZE 8192
BLCKSZ does not have to be 8kB.
+ size_t compressedSize;
+ int nextChunkLen = CHUNK_SIZE;
+
+ /* Last chunk to be read from the input. */
+ if (avail_in < CHUNK_SIZE)
+ nextChunkLen = avail_in;
This is the only place where CHUNK_SIZE gets used, and I don't think I
see any point to it. I think the 5th argument to LZ4F_compressUpdate
could just be avail_in. And as soon as you do that then I think
bbsink_lz4_archive_contents() no longer needs to be a loop. For gzip,
the output buffer isn't guaranteed to be big enough to write all the
data, so the compression step can fail to compress all the data. But
LZ4 forces us to make the output buffer big enough that no such
failure can happen. Therefore, that can't happen here except if you
artificially limit the amount of data that you pass to
LZ4F_compressUpdate() to something less than the size of the input
buffer. And I don't see any reason to do that.
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
I think there's some issue with these two chunks of code. What happens
if one of these functions wants to write more data than will fit in
the output buffer? It seems like either there needs to be some code
someplace that ensures adequate space in the output buffer at the time
of these calls, or else there needs to be a retry loop that writes as
much of the data as possible, flushes the output buffer, and then
loops to generate more output data. But there's clearly no retry loop
here, and I don't see any code that guarantees that the output buffer
has to be large enough (and in the case of LZ4F_compressEnd, have
enough remaining space) either. In other words, all the same concerns
that apply to LZ4F_compressUpdate() also apply here ... but in
LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code
to make sure that the buffer is certain to be large enough (which is
more than you need, you only need one of those) and here you seem to
have NEITHER of those things (which is not enough, you need one or the
other).
+ /* Initialize compressor object. */
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->frameInfo.blockMode = LZ4F_blockLinked;
+ prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum;
+ prefs->frameInfo.frameType = LZ4F_frame;
+ prefs->frameInfo.contentSize = 0;
+ prefs->frameInfo.dictID = 0;
+ prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum;
+ prefs->compressionLevel = 0;
+ prefs->autoFlush = 0;
+ prefs->favorDecSpeed = 0;
+ prefs->reserved[0] = 0;
+ prefs->reserved[1] = 0;
+ prefs->reserved[2] = 0;
How about instead using memset() to zero the whole thing and then
omitting the zero initializations? That seems like it would be less
fragile, if the upstream structure definition ever changes.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks, Robert for reviewing the patch.
On Tue, Oct 12, 2021 at 11:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
This is the only place where CHUNK_SIZE gets used, and I don't think I
see any point to it. I think the 5th argument to LZ4F_compressUpdate
could just be avail_in. And as soon as you do that then I think
bbsink_lz4_archive_contents() no longer needs to be a loop.
Agree. Removed the CHUNK_SIZE and the loop.
+ /* First of all write the frame header to destination buffer. */ + headerSize = LZ4F_compressBegin(mysink->ctx, + mysink->base.bbs_next->bbs_buffer, + mysink->base.bbs_next->bbs_buffer_length, + &mysink->prefs);+ compressedSize = LZ4F_compressEnd(mysink->ctx, + mysink->base.bbs_next->bbs_buffer + mysink->bytes_written, + mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written, + NULL);I think there's some issue with these two chunks of code. What happens
if one of these functions wants to write more data than will fit in
the output buffer? It seems like either there needs to be some code
someplace that ensures adequate space in the output buffer at the time
of these calls, or else there needs to be a retry loop that writes as
much of the data as possible, flushes the output buffer, and then
loops to generate more output data. But there's clearly no retry loop
here, and I don't see any code that guarantees that the output buffer
has to be large enough (and in the case of LZ4F_compressEnd, have
enough remaining space) either. In other words, all the same concerns
that apply to LZ4F_compressUpdate() also apply here ... but in
LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code
to make sure that the buffer is certain to be large enough (which is
more than you need, you only need one of those) and here you seem to
have NEITHER of those things (which is not enough, you need one or the
other).
Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
make sure we reserve enough extra bytes for the header and the footer those
are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
The LZ4F_compressBound() when passed the input size as "0", would give
the upper bound for output buffer needed by the LZ4F_compressEnd().
How about instead using memset() to zero the whole thing and then
omitting the zero initializations? That seems like it would be less
fragile, if the upstream structure definition ever changes.
Made this change.
Please review the patch, and let me know your comments.
Regards,
Jeevan Ladhe
Attachments:
lz4_compress_v5.patchapplication/octet-stream; name=lz4_compress_v5.patchDownload
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d6df3fdeb2..64641903bf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -303,6 +304,8 @@ perform_base_backup(basebackup_options *opt)
/* Set up server-side compression, if client requested it */
if (opt->compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt->compression_level);
+ else if (opt->compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -936,6 +939,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..e379c3b61e
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,270 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+#define CHUNK_SIZE 8192
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+ size_t output_buffer_bound;
+ size_t lz4_footer_bound;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t next_buf_len;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Remember the compressed buffer bound needed for input buffer to avoid
+ * recomputation in bbsink_lz4_archive_contents().
+ */
+ mysink->output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /* Get the upper bound for the buffer as expected by LZ4F_compressEnd(). */
+ mysink->lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer. Also, make sure that the buffer has enough space to accommodate
+ * for the header and the footer.
+ */
+ next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound +
+ mysink->lz4_footer_bound;
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ next_buf_len = next_buf_len + BLCKSZ - (next_buf_len % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, next_buf_len);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /*
+ * If the number of available bytes has fallen below the value computed
+ * by LZ4F_compressBound(), ask the next sink to process the data so
+ * that we can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written -
+ mysink->lz4_footer_bound) < mysink->output_buffer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+}
+
+/*
+ * Finalize the lz4 frame and then get that forwarded to the successor sink
+ * as archive content. Then, we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written <= mysink->base.bbs_next->bbs_buffer_length -
+ lz4_footer_bound);
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index f09aecb53b..84dc305d56 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -264,6 +264,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
On Thu, Oct 14, 2021 at 1:21 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
Agree. Removed the CHUNK_SIZE and the loop.
Try harder. :-)
The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner.
Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
make sure we reserve enough extra bytes for the header and the footer those
are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
The LZ4F_compressBound() when passed the input size as "0", would give
the upper bound for output buffer needed by the LZ4F_compressEnd().
I think this is not the best way to accomplish the goal. Adding
LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
bigger for something that's only going to happen once. We are assuming
in any case, I think, that LZ4F_compressBound(0) <=
LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
do is have bbsink_end_archive() empty the buffer, if necessary, before
calling LZ4F_compressEnd(). With just that change, you can set
next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
but that's also more than you need. You can instead do next_buf_len =
Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
probably thinking that won't work, because bbsink_lz4_begin_archive()
could fill up the buffer partway, and then the first call to
bbsink_lz4_archive_contents() could overrun it. But that problem can
be solved by reversing the order of operations in
bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
test whether you need to empty the buffer first, and if so, do it.
That's actually less confusing than the way you've got it, because as
you have it written, we don't really know why we're emptying the
buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
or is it to prepare for the call to LZ4F_compressEnd()? How do we know
now how much space the next person writing into the buffer is going to
need? It seems better if bbsink_lz4_archive_contents() empties the
buffer before calling LZ4F_compressUpdate() if that call might not
have enough space, and likewise bbsink_lz4_end_archive() empties the
buffer before calling LZ4F_compressEnd() if that's needed. That way,
each callback makes the space *it* needs, not the space the *next*
caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
buffer after LZ4F_compressEnd(), so we don't orphan any data.)
On another note, if the call to LZ4F_freeCompressionContext() is
required in bbsink_lz4_end_archive(), then I think this code is going
to just leak the memory used by the compression context if an error
occurs before this code is reached. That kind of sucks. The way to fix
it, I suppose, is a TRY/CATCH block, but I don't think that can be
something internal to basebackup_lz4.c: I think the bbsink stuff would
need to provide some kind of infrastructure for basebackup_lz4.c to
use. It would be a lot better if we could instead get LZ4 to allocate
memory using palloc(), but a quick Google search suggests that you
can't accomplish that without recompiling liblz4, and that's not
workable since we don't want to require a liblz4 built specifically
for PostgreSQL. Do you see any other solution?
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Robert,
The loop is gone, but CHUNK_SIZE itself seems to have evaded the
executioner.
I am sorry, but I did not really get it. Or it is what you have pointed
in the following paragraphs?
I think this is not the best way to accomplish the goal. Adding
LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
bigger for something that's only going to happen once.
Yes, you are right. I missed this.
We are assuming in any case, I think, that LZ4F_compressBound(0) <=
LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
do is have bbsink_end_archive() empty the buffer, if necessary, before
calling LZ4F_compressEnd().
This is a fair enough assumption.
With just that change, you can set
next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
but that's also more than you need. You can instead do next_buf_len =
Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
probably thinking that won't work, because bbsink_lz4_begin_archive()
could fill up the buffer partway, and then the first call to
bbsink_lz4_archive_contents() could overrun it. But that problem can
be solved by reversing the order of operations in
bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
test whether you need to empty the buffer first, and if so, do it.
I am still not able to get - how can we survive with a mere
size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
proposal, it is almost guaranteed that the next buffer length will
be always set to 19, which will result in failure of a call to
LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
even if we had called bbsink_archive_contents() before.
That's actually less confusing than the way you've got it, because as
you have it written, we don't really know why we're emptying the
buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
or is it to prepare for the call to LZ4F_compressEnd()? How do we know
now how much space the next person writing into the buffer is going to
need? It seems better if bbsink_lz4_archive_contents() empties the
buffer before calling LZ4F_compressUpdate() if that call might not
have enough space, and likewise bbsink_lz4_end_archive() empties the
buffer before calling LZ4F_compressEnd() if that's needed. That way,
each callback makes the space *it* needs, not the space the *next*
caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
buffer after LZ4F_compressEnd(), so we don't orphan any data.)
Sure, I get your point here.
On another note, if the call to LZ4F_freeCompressionContext() is
required in bbsink_lz4_end_archive(), then I think this code is going
to just leak the memory used by the compression context if an error
occurs before this code is reached. That kind of sucks.
Yes, the LZ4F_freeCompressionContext() is needed to clear the
LZ4F_cctx. The structure LZ4F_cctx_s maintains internal stages
of compression, internal buffers, etc.
The way to fix
it, I suppose, is a TRY/CATCH block, but I don't think that can be
something internal to basebackup_lz4.c: I think the bbsink stuff would
need to provide some kind of infrastructure for basebackup_lz4.c to
use. It would be a lot better if we could instead get LZ4 to allocate
memory using palloc(), but a quick Google search suggests that you
can't accomplish that without recompiling liblz4, and that's not
workable since we don't want to require a liblz4 built specifically
for PostgreSQL. Do you see any other solution?
You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.
Regards,
Jeevan Ladhe
On Fri, Oct 15, 2021 at 7:54 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner.
I am sorry, but I did not really get it. Or it is what you have pointed
in the following paragraphs?
I mean #define CHUNK_SIZE is still in the patch.
With just that change, you can set
next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
but that's also more than you need. You can instead do next_buf_len =
Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
probably thinking that won't work, because bbsink_lz4_begin_archive()
could fill up the buffer partway, and then the first call to
bbsink_lz4_archive_contents() could overrun it. But that problem can
be solved by reversing the order of operations in
bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
test whether you need to empty the buffer first, and if so, do it.I am still not able to get - how can we survive with a mere
size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
proposal, it is almost guaranteed that the next buffer length will
be always set to 19, which will result in failure of a call to
LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
even if we had called bbsink_archive_contents() before.
Sorry, should have been Max(), not Min().
You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.
OK. Will give it some thought.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Robert,
I mean #define CHUNK_SIZE is still in the patch.
Oops, removed now.
With just that change, you can set
next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
but that's also more than you need. You can instead do next_buf_len =
Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
probably thinking that won't work, because bbsink_lz4_begin_archive()
could fill up the buffer partway, and then the first call to
bbsink_lz4_archive_contents() could overrun it. But that problem can
be solved by reversing the order of operations in
bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
test whether you need to empty the buffer first, and if so, do it.I am still not able to get - how can we survive with a mere
size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
proposal, it is almost guaranteed that the next buffer length will
be always set to 19, which will result in failure of a call to
LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
even if we had called bbsink_archive_contents() before.Sorry, should have been Max(), not Min().
Ahh ok.
I looked into the code of LZ4F_compressBound() and the header size is
already included in the calculations, so when we compare
LZ4F_HEADER_SIZE_MAX and mysink->output_buffer_bound, the latter
will be always greater, and hence sufficient length for the output buffer. I
made this change in the attached patch, and also cleared the buffer by
calling bbsink_archive_contents() before calling LZ4_compressUpdate().
Similarly cleared the buffer before calling LZ4F_compressEnd().
You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.OK. Will give it some thought.
I have started a thread[1]https://groups.google.com/g/lz4c/c/WnJkKwBWlcM/m/zszrla2mBQAJ?utm_medium=email&utm_source=footer on LZ4 community for this, but so far no
reply on that.
Regards,
Jeevan Ladhe
[1]: https://groups.google.com/g/lz4c/c/WnJkKwBWlcM/m/zszrla2mBQAJ?utm_medium=email&utm_source=footer
https://groups.google.com/g/lz4c/c/WnJkKwBWlcM/m/zszrla2mBQAJ?utm_medium=email&utm_source=footer
Attachments:
lz4_compress_v6.patchapplication/octet-stream; name=lz4_compress_v6.patchDownload
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 61c76160d1..7f05cb85bf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -303,6 +304,8 @@ perform_base_backup(basebackup_options *opt)
/* Set up server-side compression, if client requested it */
if (opt->compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt->compression_level);
+ else if (opt->compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = progress_sink = bbsink_progress_new(sink, opt->progress);
@@ -946,6 +949,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..37a419d8d0
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,266 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed
+ * by LZ4F_compressBound(), ask the next sink to process the data so
+ * that we can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 10a316cacd..b1accf3f8a 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -264,6 +264,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
On Fri, Oct 15, 2021 at 8:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.OK. Will give it some thought.
Here's a new patch set. I've tried adding a "cleanup" callback to the
bbsink method and ensuring that it gets called even in case of an
error. The code for that is untested since I have no use for it with
the existing basebackup sink types, so let me know how it goes when
you try to use it for LZ4.
I've also added documentation for the new pg_basebackup options in
this version, and I fixed up a couple of these patches to be
pgindent-clean when they previously were not.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v8-0001-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v8-0001-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload
From 961bf99026be2ae986e09069e19fc71ddb723ddb Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 14:33:19 -0400
Subject: [PATCH v8 1/5] Introduce 'bbsink' abstraction to modularize base
backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc', but in the future we might introduce
other options.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
---
src/backend/replication/Makefile | 4 +
src/backend/replication/backup_manifest.c | 28 +-
src/backend/replication/basebackup.c | 686 +++++-------------
src/backend/replication/basebackup_copy.c | 335 +++++++++
src/backend/replication/basebackup_progress.c | 246 +++++++
src/backend/replication/basebackup_sink.c | 125 ++++
src/backend/replication/basebackup_throttle.c | 199 +++++
src/include/replication/backup_manifest.h | 5 +-
src/include/replication/basebackup_sink.h | 296 ++++++++
src/tools/pgindent/typedefs.list | 4 +
10 files changed, 1415 insertions(+), 513 deletions(-)
create mode 100644 src/backend/replication/basebackup_copy.c
create mode 100644 src/backend/replication/basebackup_progress.c
create mode 100644 src/backend/replication/basebackup_sink.c
create mode 100644 src/backend/replication/basebackup_throttle.c
create mode 100644 src/include/replication/basebackup_sink.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a0381e52f3..74b97cf126 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -17,6 +17,10 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = \
backup_manifest.o \
basebackup.o \
+ basebackup_copy.o \
+ basebackup_progress.o \
+ basebackup_sink.o \
+ basebackup_throttle.o \
repl_gram.o \
slot.o \
slotfuncs.o \
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 04ca455ace..4fe11a3b5c 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -17,6 +17,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "replication/backup_manifest.h"
+#include "replication/basebackup_sink.h"
#include "utils/builtins.h"
#include "utils/json.h"
@@ -310,9 +311,8 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
* Finalize the backup manifest, and send it to the client.
*/
void
-SendBackupManifest(backup_manifest_info *manifest)
+SendBackupManifest(backup_manifest_info *manifest, bbsink *sink)
{
- StringInfoData protobuf;
uint8 checksumbuf[PG_SHA256_DIGEST_LENGTH];
char checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
size_t manifest_bytes_done = 0;
@@ -352,38 +352,28 @@ SendBackupManifest(backup_manifest_info *manifest)
(errcode_for_file_access(),
errmsg("could not rewind temporary file")));
- /* Send CopyOutResponse message */
- pq_beginmessage(&protobuf, 'H');
- pq_sendbyte(&protobuf, 0); /* overall format */
- pq_sendint16(&protobuf, 0); /* natts */
- pq_endmessage(&protobuf);
/*
- * Send CopyData messages.
- *
- * We choose to read back the data from the temporary file in chunks of
- * size BLCKSZ; this isn't necessary, but buffile.c uses that as the I/O
- * size, so it seems to make sense to match that value here.
+ * Send the backup manifest.
*/
+ bbsink_begin_manifest(sink);
while (manifest_bytes_done < manifest->manifest_size)
{
- char manifestbuf[BLCKSZ];
size_t bytes_to_read;
size_t rc;
- bytes_to_read = Min(sizeof(manifestbuf),
+ bytes_to_read = Min(sink->bbs_buffer_length,
manifest->manifest_size - manifest_bytes_done);
- rc = BufFileRead(manifest->buffile, manifestbuf, bytes_to_read);
+ rc = BufFileRead(manifest->buffile, sink->bbs_buffer,
+ bytes_to_read);
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from temporary file: %m")));
- pq_putmessage('d', manifestbuf, bytes_to_read);
+ bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
-
- /* No more data, so send CopyDone message */
- pq_putemptymessage('c');
+ bbsink_end_manifest(sink);
/* Release resources */
BufFileClose(manifest->buffile);
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b31c36d918..482872b45c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,13 +17,9 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
-#include "catalog/pg_type.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
-#include "commands/progress.h"
#include "lib/stringinfo.h"
-#include "libpq/libpq.h"
-#include "libpq/pqformat.h"
#include "miscadmin.h"
#include "nodes/pg_list.h"
#include "pgstat.h"
@@ -31,6 +27,7 @@
#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -46,6 +43,16 @@
#include "utils/resowner.h"
#include "utils/timestamp.h"
+/*
+ * How much data do we want to send in one CopyData message? Note that
+ * this may also result in reading the underlying files in chunks of this
+ * size.
+ *
+ * NB: The buffer size is required to be a multiple of the system block
+ * size, so use that value instead if it's bigger than our preference.
+ */
+#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+
typedef struct
{
const char *label;
@@ -59,27 +66,25 @@ typedef struct
pg_checksum_type manifest_checksum_type;
} basebackup_options;
-static int64 sendTablespace(char *path, char *oid, bool sizeonly,
+static int64 sendTablespace(bbsink *sink, char *path, char *oid, bool sizeonly,
struct backup_manifest_info *manifest);
-static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
+static int64 sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks,
backup_manifest_info *manifest, const char *spcoid);
-static bool sendFile(const char *readfilename, const char *tarfilename,
+static bool sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid);
-static void sendFileWithContent(const char *filename, const char *content,
+static void sendFileWithContent(bbsink *sink, const char *filename,
+ const char *content,
backup_manifest_info *manifest);
-static int64 _tarWriteHeader(const char *filename, const char *linktarget,
- struct stat *statbuf, bool sizeonly);
+static int64 _tarWriteHeader(bbsink *sink, const char *filename,
+ const char *linktarget, struct stat *statbuf,
+ bool sizeonly);
+static void _tarWritePadding(bbsink *sink, int len);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
-static void send_int8_string(StringInfoData *buf, int64 intval);
-static void SendBackupHeader(List *tablespaces);
-static void perform_base_backup(basebackup_options *opt);
+static void perform_base_backup(basebackup_options *opt, bbsink *sink);
static void parse_basebackup_options(List *options, basebackup_options *opt);
-static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static int compareWalFileNames(const ListCell *a, const ListCell *b);
-static void throttle(size_t increment);
-static void update_basebackup_progress(int64 delta);
static bool is_checksummed_file(const char *fullpath, const char *filename);
static int basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
const char *filename, bool partial_read_ok);
@@ -90,46 +95,12 @@ static bool backup_started_in_recovery = false;
/* Relative path of temporary statistics directory */
static char *statrelpath = NULL;
-/*
- * Size of each block sent into the tar stream for larger files.
- */
-#define TAR_SEND_SIZE 32768
-
-/*
- * How frequently to throttle, as a fraction of the specified rate-second.
- */
-#define THROTTLING_FREQUENCY 8
-
-/* The actual number of bytes, transfer of which may cause sleep. */
-static uint64 throttling_sample;
-
-/* Amount of data already transferred but not yet throttled. */
-static int64 throttling_counter;
-
-/* The minimum time required to transfer throttling_sample bytes. */
-static TimeOffset elapsed_min_unit;
-
-/* The last check of the transfer rate. */
-static TimestampTz throttled_last;
-
-/* The starting XLOG position of the base backup. */
-static XLogRecPtr startptr;
-
/* Total number of checksum failures during base backup. */
static long long int total_checksum_failures;
/* Do not verify checksums. */
static bool noverify_checksums = false;
-/*
- * Total amount of backup data that will be streamed.
- * -1 means that the size is not estimated.
- */
-static int64 backup_total = 0;
-
-/* Amount of backup data already streamed */
-static int64 backup_streamed = 0;
-
/*
* Definition of one element part of an exclusion list, used for paths part
* of checksum validation or base backups. "name" is the name of the file
@@ -253,32 +224,22 @@ static const struct exclude_list_item noChecksumFiles[] = {
* clobbered by longjmp" from stupider versions of gcc.
*/
static void
-perform_base_backup(basebackup_options *opt)
+perform_base_backup(basebackup_options *opt, bbsink *sink)
{
- TimeLineID starttli;
+ bbsink_state state;
XLogRecPtr endptr;
TimeLineID endtli;
StringInfo labelfile;
StringInfo tblspc_map_file;
backup_manifest_info manifest;
int datadirpathlen;
- List *tablespaces = NIL;
- backup_total = 0;
- backup_streamed = 0;
- pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
-
- /*
- * If the estimation of the total backup size is disabled, make the
- * backup_total column in the view return NULL by setting the parameter to
- * -1.
- */
- if (!opt->progress)
- {
- backup_total = -1;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- backup_total);
- }
+ /* Initial backup state, insofar as we know it now. */
+ state.tablespaces = NIL;
+ state.tablespace_num = 0;
+ state.bytes_done = 0;
+ state.bytes_total = 0;
+ state.bytes_total_is_valid = false;
/* we're going to use a BufFile, so we need a ResourceOwner */
Assert(CurrentResourceOwner == NULL);
@@ -295,11 +256,11 @@ perform_base_backup(basebackup_options *opt)
total_checksum_failures = 0;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
- startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
- labelfile, &tablespaces,
- tblspc_map_file);
+ basebackup_progress_wait_checkpoint();
+ state.startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint,
+ &state.starttli,
+ labelfile, &state.tablespaces,
+ tblspc_map_file);
/*
* Once do_pg_start_backup has been called, ensure that any failure causes
@@ -312,7 +273,6 @@ perform_base_backup(basebackup_options *opt)
{
ListCell *lc;
tablespaceinfo *ti;
- int tblspc_streamed = 0;
/*
* Calculate the relative path of temporary statistics directory in
@@ -329,7 +289,7 @@ perform_base_backup(basebackup_options *opt)
/* Add a node for the base directory at the end */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = -1;
- tablespaces = lappend(tablespaces, ti);
+ state.tablespaces = lappend(state.tablespaces, ti);
/*
* Calculate the total backup size by summing up the size of each
@@ -337,100 +297,53 @@ perform_base_backup(basebackup_options *opt)
*/
if (opt->progress)
{
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+ basebackup_progress_estimate_backup_size();
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *tmp = (tablespaceinfo *) lfirst(lc);
if (tmp->path == NULL)
- tmp->size = sendDir(".", 1, true, tablespaces, true, NULL,
- NULL);
+ tmp->size = sendDir(sink, ".", 1, true, state.tablespaces,
+ true, NULL, NULL);
else
- tmp->size = sendTablespace(tmp->path, tmp->oid, true,
+ tmp->size = sendTablespace(sink, tmp->path, tmp->oid, true,
NULL);
- backup_total += tmp->size;
+ state.bytes_total += tmp->size;
}
+ state.bytes_total_is_valid = true;
}
- /* Report that we are now streaming database files as a base backup */
- {
- const int index[] = {
- PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL,
- PROGRESS_BASEBACKUP_TBLSPC_TOTAL
- };
- const int64 val[] = {
- PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP,
- backup_total, list_length(tablespaces)
- };
-
- pgstat_progress_update_multi_param(3, index, val);
- }
-
- /* Send the starting position of the backup */
- SendXlogRecPtrResult(startptr, starttli);
-
- /* Send tablespace header */
- SendBackupHeader(tablespaces);
-
- /* Setup and activate network throttling, if client requested it */
- if (opt->maxrate > 0)
- {
- throttling_sample =
- (int64) opt->maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
-
- /*
- * The minimum amount of time for throttling_sample bytes to be
- * transferred.
- */
- elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
-
- /* Enable throttling. */
- throttling_counter = 0;
-
- /* The 'real data' starts now (header was ignored). */
- throttled_last = GetCurrentTimestamp();
- }
- else
- {
- /* Disable throttling. */
- throttling_counter = -1;
- }
+ /* notify basebackup sink about start of backup */
+ bbsink_begin_backup(sink, &state, SINK_BUFFER_LENGTH);
/* Send off our tablespaces one by one */
- foreach(lc, tablespaces)
+ foreach(lc, state.tablespaces)
{
tablespaceinfo *ti = (tablespaceinfo *) lfirst(lc);
- StringInfoData buf;
-
- /* Send CopyOutResponse message */
- pq_beginmessage(&buf, 'H');
- pq_sendbyte(&buf, 0); /* overall format */
- pq_sendint16(&buf, 0); /* natts */
- pq_endmessage(&buf);
if (ti->path == NULL)
{
struct stat statbuf;
bool sendtblspclinks = true;
+ bbsink_begin_archive(sink, "base.tar");
+
/* In the main tar, include the backup_label first... */
- sendFileWithContent(BACKUP_LABEL_FILE, labelfile->data,
+ sendFileWithContent(sink, BACKUP_LABEL_FILE, labelfile->data,
&manifest);
/* Then the tablespace_map file, if required... */
if (opt->sendtblspcmapfile)
{
- sendFileWithContent(TABLESPACE_MAP, tblspc_map_file->data,
+ sendFileWithContent(sink, TABLESPACE_MAP, tblspc_map_file->data,
&manifest);
sendtblspclinks = false;
}
/* Then the bulk of the files... */
- sendDir(".", 1, false, tablespaces, sendtblspclinks,
- &manifest, NULL);
+ sendDir(sink, ".", 1, false, state.tablespaces,
+ sendtblspclinks, &manifest, NULL);
/* ... and pg_control after everything else. */
if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
@@ -438,32 +351,33 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
+ sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
false, InvalidOid, &manifest, NULL);
}
else
- sendTablespace(ti->path, ti->oid, false, &manifest);
+ {
+ char *archive_name = psprintf("%s.tar", ti->oid);
+
+ bbsink_begin_archive(sink, archive_name);
+
+ sendTablespace(sink, ti->path, ti->oid, false, &manifest);
+ }
/*
* If we're including WAL, and this is the main data directory we
- * don't terminate the tar stream here. Instead, we will append
- * the xlog files below and terminate it then. This is safe since
- * the main data directory is always sent *last*.
+ * don't treat this as the end of the tablespace. Instead, we will
+ * include the xlog files below and stop afterwards. This is safe
+ * since the main data directory is always sent *last*.
*/
if (opt->includewal && ti->path == NULL)
{
- Assert(lnext(tablespaces, lc) == NULL);
+ Assert(lnext(state.tablespaces, lc) == NULL);
}
else
- pq_putemptymessage('c'); /* CopyDone */
-
- tblspc_streamed++;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
- tblspc_streamed);
+ bbsink_end_archive(sink);
}
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE);
+ basebackup_progress_wait_wal_archive(&state);
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
}
PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(false));
@@ -489,8 +403,7 @@ perform_base_backup(basebackup_options *opt)
ListCell *lc;
TimeLineID tli;
- pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
- PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+ basebackup_progress_transfer_wal();
/*
* I'd rather not worry about timelines here, so scan pg_wal and
@@ -501,7 +414,7 @@ perform_base_backup(basebackup_options *opt)
* shouldn't be such files, but if there are, there's little harm in
* including them.
*/
- XLByteToSeg(startptr, startsegno, wal_segment_size);
+ XLByteToSeg(state.startptr, startsegno, wal_segment_size);
XLogFileName(firstoff, ThisTimeLineID, startsegno, wal_segment_size);
XLByteToPrevSeg(endptr, endsegno, wal_segment_size);
XLogFileName(lastoff, ThisTimeLineID, endsegno, wal_segment_size);
@@ -591,7 +504,6 @@ perform_base_backup(basebackup_options *opt)
{
char *walFileName = (char *) lfirst(lc);
int fd;
- char buf[TAR_SEND_SIZE];
size_t cnt;
pgoff_t len = 0;
@@ -630,22 +542,17 @@ perform_base_backup(basebackup_options *opt)
}
/* send the WAL file itself */
- _tarWriteHeader(pathbuf, NULL, &statbuf, false);
+ _tarWriteHeader(sink, pathbuf, NULL, &statbuf, false);
- while ((cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf),
+ while ((cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length,
wal_segment_size - len),
len, pathbuf, true)) > 0)
{
CheckXLogRemoved(segno, tli);
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
len += cnt;
- throttle(cnt);
if (len == wal_segment_size)
break;
@@ -674,7 +581,7 @@ perform_base_backup(basebackup_options *opt)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
/*
@@ -697,23 +604,23 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid,
+ sendFile(sink, pathbuf, pathbuf, &statbuf, false, InvalidOid,
&manifest, NULL);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", &manifest);
}
- /* Send CopyDone message for the last tar file */
- pq_putemptymessage('c');
+ bbsink_end_archive(sink);
}
- AddWALInfoToBackupManifest(&manifest, startptr, starttli, endptr, endtli);
+ AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
+ endptr, endtli);
- SendBackupManifest(&manifest);
+ SendBackupManifest(&manifest, sink);
- SendXlogRecPtrResult(endptr, endtli);
+ bbsink_end_backup(sink, endptr, endtli);
if (total_checksum_failures)
{
@@ -739,7 +646,7 @@ perform_base_backup(basebackup_options *opt)
/* clean up the resource owner we created */
WalSndResourceCleanup(true);
- pgstat_progress_end_command();
+ basebackup_progress_done();
}
/*
@@ -944,6 +851,7 @@ void
SendBaseBackup(BaseBackupCmd *cmd)
{
basebackup_options opt;
+ bbsink *sink;
parse_basebackup_options(cmd->options, &opt);
@@ -958,158 +866,40 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- perform_base_backup(&opt);
-}
-
-static void
-send_int8_string(StringInfoData *buf, int64 intval)
-{
- char is[32];
-
- sprintf(is, INT64_FORMAT, intval);
- pq_sendint32(buf, strlen(is));
- pq_sendbytes(buf, is, strlen(is));
-}
-
-static void
-SendBackupHeader(List *tablespaces)
-{
- StringInfoData buf;
- ListCell *lc;
-
- /* Construct and send the directory information */
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 3); /* 3 fields */
-
- /* First field - spcoid */
- pq_sendstring(&buf, "spcoid");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, OIDOID); /* type oid */
- pq_sendint16(&buf, 4); /* typlen */
- pq_sendint32(&buf, 0); /* typmod */
- pq_sendint16(&buf, 0); /* format code */
-
- /* Second field - spclocation */
- pq_sendstring(&buf, "spclocation");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, TEXTOID);
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- /* Third field - size */
- pq_sendstring(&buf, "size");
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_sendint32(&buf, INT8OID);
- pq_sendint16(&buf, 8);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- foreach(lc, tablespaces)
- {
- tablespaceinfo *ti = lfirst(lc);
-
- /* Send one datarow message */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 3); /* number of columns */
- if (ti->path == NULL)
- {
- pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
- pq_sendint32(&buf, -1);
- }
- else
- {
- Size len;
-
- len = strlen(ti->oid);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->oid, len);
-
- len = strlen(ti->path);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, ti->path, len);
- }
- if (ti->size >= 0)
- send_int8_string(&buf, ti->size / 1024);
- else
- pq_sendint32(&buf, -1); /* NULL */
+ /* Create a basic basebackup sink. */
+ sink = bbsink_copytblspc_new();
- pq_endmessage(&buf);
- }
+ /* Set up network throttling, if client requested it */
+ if (opt.maxrate > 0)
+ sink = bbsink_throttle_new(sink, opt.maxrate);
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
-}
-
-/*
- * Send a single resultset containing just a single
- * XLogRecPtr record (in text format)
- */
-static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
-{
- StringInfoData buf;
- char str[MAXFNAMELEN];
- Size len;
-
- pq_beginmessage(&buf, 'T'); /* RowDescription */
- pq_sendint16(&buf, 2); /* 2 fields */
-
- /* Field headers */
- pq_sendstring(&buf, "recptr");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
- pq_sendint32(&buf, TEXTOID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
-
- pq_sendstring(&buf, "tli");
- pq_sendint32(&buf, 0); /* table oid */
- pq_sendint16(&buf, 0); /* attnum */
+ /* Set up progress reporting. */
+ sink = bbsink_progress_new(sink, opt.progress);
/*
- * int8 may seem like a surprising data type for this, but in theory int4
- * would not be wide enough for this, as TimeLineID is unsigned.
+ * Perform the base backup, but make sure we clean up the bbsink even if
+ * an error occurs.
*/
- pq_sendint32(&buf, INT8OID); /* type oid */
- pq_sendint16(&buf, -1);
- pq_sendint32(&buf, 0);
- pq_sendint16(&buf, 0);
- pq_endmessage(&buf);
-
- /* Data row */
- pq_beginmessage(&buf, 'D');
- pq_sendint16(&buf, 2); /* number of columns */
-
- len = snprintf(str, sizeof(str),
- "%X/%X", LSN_FORMAT_ARGS(ptr));
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- len = snprintf(str, sizeof(str), "%u", tli);
- pq_sendint32(&buf, len);
- pq_sendbytes(&buf, str, len);
-
- pq_endmessage(&buf);
-
- /* Send a CommandComplete message */
- pq_puttextmessage('C', "SELECT");
+ PG_TRY();
+ {
+ perform_base_backup(&opt, sink);
+ }
+ PG_FINALLY();
+ {
+ bbsink_cleanup(sink);
+ }
+ PG_END_TRY();
}
/*
* Inject a file with given name and content in the output tar stream.
*/
static void
-sendFileWithContent(const char *filename, const char *content,
+sendFileWithContent(bbsink *sink, const char *filename, const char *content,
backup_manifest_info *manifest)
{
struct stat statbuf;
- int pad,
+ int bytes_done = 0,
len;
pg_checksum_context checksum_ctx;
@@ -1135,25 +925,23 @@ sendFileWithContent(const char *filename, const char *content,
statbuf.st_mode = pg_file_create_mode;
statbuf.st_size = len;
- _tarWriteHeader(filename, NULL, &statbuf, false);
- /* Send the contents as a CopyData message */
- pq_putmessage('d', content, len);
- update_basebackup_progress(len);
+ _tarWriteHeader(sink, filename, NULL, &statbuf, false);
- /* Pad to a multiple of the tar block size. */
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
+ if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
+ elog(ERROR, "could not update checksum of file \"%s\"",
+ filename);
+
+ while (bytes_done < len)
{
- char buf[TAR_BLOCK_SIZE];
+ size_t remaining = len - bytes_done;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
+ memcpy(sink->bbs_buffer, content, nbytes);
+ bbsink_archive_contents(sink, nbytes);
+ bytes_done += nbytes;
}
- if (pg_checksum_update(&checksum_ctx, (uint8 *) content, len) < 0)
- elog(ERROR, "could not update checksum of file \"%s\"",
- filename);
+ _tarWritePadding(sink, len);
AddFileToBackupManifest(manifest, NULL, filename, len,
(pg_time_t) statbuf.st_mtime, &checksum_ctx);
@@ -1167,7 +955,7 @@ sendFileWithContent(const char *filename, const char *content,
* Only used to send auxiliary tablespaces, not PGDATA.
*/
static int64
-sendTablespace(char *path, char *spcoid, bool sizeonly,
+sendTablespace(bbsink *sink, char *path, char *spcoid, bool sizeonly,
backup_manifest_info *manifest)
{
int64 size;
@@ -1197,11 +985,11 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
return 0;
}
- size = _tarWriteHeader(TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
+ size = _tarWriteHeader(sink, TABLESPACE_VERSION_DIRECTORY, NULL, &statbuf,
sizeonly);
/* Send all the files in the tablespace version directory */
- size += sendDir(pathbuf, strlen(path), sizeonly, NIL, true, manifest,
+ size += sendDir(sink, pathbuf, strlen(path), sizeonly, NIL, true, manifest,
spcoid);
return size;
@@ -1220,8 +1008,8 @@ sendTablespace(char *path, char *spcoid, bool sizeonly,
* as it will be sent separately in the tablespace_map file.
*/
static int64
-sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
- bool sendtblspclinks, backup_manifest_info *manifest,
+sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
+ List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)
{
DIR *dir;
@@ -1381,8 +1169,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
excludeFound = true;
break;
}
@@ -1399,8 +1187,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
elog(DEBUG1, "contents of directory \"%s\" excluded from backup", statrelpath);
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
continue;
}
@@ -1413,15 +1201,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
{
/* If pg_wal is a symlink, write it as a directory anyway */
convert_link_to_directory(pathbuf, &statbuf);
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL,
+ &statbuf, sizeonly);
/*
* Also send archive_status directory (by hackishly reusing
* statbuf from above ...).
*/
- size += _tarWriteHeader("./pg_wal/archive_status", NULL, &statbuf,
- sizeonly);
+ size += _tarWriteHeader(sink, "./pg_wal/archive_status", NULL,
+ &statbuf, sizeonly);
continue; /* don't recurse into pg_wal */
}
@@ -1452,7 +1240,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
pathbuf)));
linkpath[rllen] = '\0';
- size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, linkpath,
&statbuf, sizeonly);
#else
@@ -1476,7 +1264,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* Store a directory entry in the tar file so we can get the
* permissions right.
*/
- size += _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
+ size += _tarWriteHeader(sink, pathbuf + basepathlen + 1, NULL, &statbuf,
sizeonly);
/*
@@ -1508,7 +1296,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
skip_this_dir = true;
if (!skip_this_dir)
- size += sendDir(pathbuf, basepathlen, sizeonly, tablespaces,
+ size += sendDir(sink, pathbuf, basepathlen, sizeonly, tablespaces,
sendtblspclinks, manifest, spcoid);
}
else if (S_ISREG(statbuf.st_mode))
@@ -1516,7 +1304,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sent = false;
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
+ sent = sendFile(sink, pathbuf, pathbuf + basepathlen + 1, &statbuf,
true, isDbDir ? atooid(lastDir + 1) : InvalidOid,
manifest, spcoid);
@@ -1593,21 +1381,19 @@ is_checksummed_file(const char *fullpath, const char *filename)
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename,
+sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
struct stat *statbuf, bool missing_ok, Oid dboid,
backup_manifest_info *manifest, const char *spcoid)
{
int fd;
BlockNumber blkno = 0;
bool block_retry = false;
- char buf[TAR_SEND_SIZE];
uint16 checksum;
int checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
char *page;
- size_t pad;
PageHeader phdr;
int segmentno = 0;
char *segmentpath;
@@ -1628,7 +1414,7 @@ sendFile(const char *readfilename, const char *tarfilename,
errmsg("could not open file \"%s\": %m", readfilename)));
}
- _tarWriteHeader(tarfilename, NULL, statbuf, false);
+ _tarWriteHeader(sink, tarfilename, NULL, statbuf, false);
if (!noverify_checksums && DataChecksumsEnabled())
{
@@ -1669,9 +1455,11 @@ sendFile(const char *readfilename, const char *tarfilename,
*/
while (len < statbuf->st_size)
{
+ size_t remaining = statbuf->st_size - len;
+
/* Try to read some more data. */
- cnt = basebackup_read_file(fd, buf,
- Min(sizeof(buf), statbuf->st_size - len),
+ cnt = basebackup_read_file(fd, sink->bbs_buffer,
+ Min(sink->bbs_buffer_length, remaining),
len, readfilename, true);
/*
@@ -1688,7 +1476,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read a multiple of
* BLCKSZ bytes.
*/
- Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
if (verify_checksum && (cnt % BLCKSZ != 0))
{
@@ -1704,7 +1492,7 @@ sendFile(const char *readfilename, const char *tarfilename,
{
for (i = 0; i < cnt / BLCKSZ; i++)
{
- page = buf + BLCKSZ * i;
+ page = sink->bbs_buffer + BLCKSZ * i;
/*
* Only check pages which have not been modified since the
@@ -1714,7 +1502,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* this case. We also skip completely new pages, since they
* don't have a checksum yet.
*/
- if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+ if (!PageIsNew(page) && PageGetLSN(page) < sink->bbs_state->startptr)
{
checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
phdr = (PageHeader) page;
@@ -1736,7 +1524,8 @@ sendFile(const char *readfilename, const char *tarfilename,
/* Reread the failed block */
reread_cnt =
- basebackup_read_file(fd, buf + BLCKSZ * i,
+ basebackup_read_file(fd,
+ sink->bbs_buffer + BLCKSZ * i,
BLCKSZ, len + BLCKSZ * i,
readfilename,
false);
@@ -1783,34 +1572,29 @@ sendFile(const char *readfilename, const char *tarfilename,
}
}
- /* Send the chunk as a CopyData message */
- if (pq_putmessage('d', buf, cnt))
- ereport(ERROR,
- (errmsg("base backup could not send data, aborting backup")));
- update_basebackup_progress(cnt);
+ bbsink_archive_contents(sink, cnt);
/* Also feed it to the checksum machinery. */
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer, cnt) < 0)
elog(ERROR, "could not update checksum of base backup");
len += cnt;
- throttle(cnt);
}
/* If the file was truncated while we were sending it, pad it with zeros */
- if (len < statbuf->st_size)
+ while (len < statbuf->st_size)
{
- MemSet(buf, 0, sizeof(buf));
- while (len < statbuf->st_size)
- {
- cnt = Min(sizeof(buf), statbuf->st_size - len);
- pq_putmessage('d', buf, cnt);
- if (pg_checksum_update(&checksum_ctx, (uint8 *) buf, cnt) < 0)
- elog(ERROR, "could not update checksum of base backup");
- update_basebackup_progress(cnt);
- len += cnt;
- throttle(cnt);
- }
+ size_t remaining = statbuf->st_size - len;
+ size_t nbytes = Min(sink->bbs_buffer_length, remaining);
+
+ MemSet(sink->bbs_buffer, 0, nbytes);
+ if (pg_checksum_update(&checksum_ctx,
+ (uint8 *) sink->bbs_buffer,
+ nbytes) < 0)
+ elog(ERROR, "could not update checksum of base backup");
+ bbsink_archive_contents(sink, nbytes);
+ len += nbytes;
}
/*
@@ -1818,13 +1602,7 @@ sendFile(const char *readfilename, const char *tarfilename,
* of data is probably not worth throttling, and is not checksummed
* because it's not actually part of the file.)
*/
- pad = tarPaddingBytesRequired(len);
- if (pad > 0)
- {
- MemSet(buf, 0, pad);
- pq_putmessage('d', buf, pad);
- update_basebackup_progress(pad);
- }
+ _tarWritePadding(sink, len);
CloseTransientFile(fd);
@@ -1847,18 +1625,28 @@ sendFile(const char *readfilename, const char *tarfilename,
return true;
}
-
static int64
-_tarWriteHeader(const char *filename, const char *linktarget,
+_tarWriteHeader(bbsink *sink, const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly)
{
- char h[TAR_BLOCK_SIZE];
enum tarError rc;
if (!sizeonly)
{
- rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
- statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
+ /*
+ * As of this writing, the smallest supported block size is 1kB, which
+ * is twice TAR_BLOCK_SIZE. Since the buffer size is required to be a
+ * multiple of BLCKSZ, it should be safe to assume that the buffer is
+ * large enough to fit an entire tar block. We double-check by means
+ * of these assertions.
+ */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= BLCKSZ,
+ "BLCKSZ too small for tar block");
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ rc = tarCreateHeader(sink->bbs_buffer, filename, linktarget,
+ statbuf->st_size, statbuf->st_mode,
+ statbuf->st_uid, statbuf->st_gid,
statbuf->st_mtime);
switch (rc)
@@ -1880,134 +1668,48 @@ _tarWriteHeader(const char *filename, const char *linktarget,
elog(ERROR, "unrecognized tar error: %d", rc);
}
- pq_putmessage('d', h, sizeof(h));
- update_basebackup_progress(sizeof(h));
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
}
- return sizeof(h);
+ return TAR_BLOCK_SIZE;
}
/*
- * If the entry in statbuf is a link, then adjust statbuf to make it look like a
- * directory, so that it will be written that way.
+ * Pad with zero bytes out to a multiple of TAR_BLOCK_SIZE.
*/
static void
-convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
+_tarWritePadding(bbsink *sink, int len)
{
- /* If symlink, write it as a directory anyway */
-#ifndef WIN32
- if (S_ISLNK(statbuf->st_mode))
-#else
- if (pgwin32_is_junction(pathbuf))
-#endif
- statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
-}
-
-/*
- * Increment the network transfer counter by the given number of bytes,
- * and sleep if necessary to comply with the requested network transfer
- * rate.
- */
-static void
-throttle(size_t increment)
-{
- TimeOffset elapsed_min;
-
- if (throttling_counter < 0)
- return;
-
- throttling_counter += increment;
- if (throttling_counter < throttling_sample)
- return;
-
- /* How much time should have elapsed at minimum? */
- elapsed_min = elapsed_min_unit *
- (throttling_counter / throttling_sample);
+ int pad = tarPaddingBytesRequired(len);
/*
- * Since the latch could be set repeatedly because of concurrently WAL
- * activity, sleep in a loop to ensure enough time has passed.
+ * As in _tarWriteHeader, it should be safe to assume that the buffer is
+ * large enough that we don't need to do this in multiple chunks.
*/
- for (;;)
- {
- TimeOffset elapsed,
- sleep;
- int wait_result;
-
- /* Time elapsed since the last measurement (and possible wake up). */
- elapsed = GetCurrentTimestamp() - throttled_last;
-
- /* sleep if the transfer is faster than it should be */
- sleep = elapsed_min - elapsed;
- if (sleep <= 0)
- break;
-
- ResetLatch(MyLatch);
-
- /* We're eating a potentially set latch, so check for interrupts */
- CHECK_FOR_INTERRUPTS();
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+ Assert(pad <= TAR_BLOCK_SIZE);
- /*
- * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
- * the maximum time to sleep. Thus the cast to long is safe.
- */
- wait_result = WaitLatch(MyLatch,
- WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- (long) (sleep / 1000),
- WAIT_EVENT_BASE_BACKUP_THROTTLE);
-
- if (wait_result & WL_LATCH_SET)
- CHECK_FOR_INTERRUPTS();
-
- /* Done waiting? */
- if (wait_result & WL_TIMEOUT)
- break;
+ if (pad > 0)
+ {
+ MemSet(sink->bbs_buffer, 0, pad);
+ bbsink_archive_contents(sink, pad);
}
-
- /*
- * As we work with integers, only whole multiple of throttling_sample was
- * processed. The rest will be done during the next call of this function.
- */
- throttling_counter %= throttling_sample;
-
- /*
- * Time interval for the remaining amount and possible next increments
- * starts now.
- */
- throttled_last = GetCurrentTimestamp();
}
/*
- * Increment the counter for the amount of data already streamed
- * by the given number of bytes, and update the progress report for
- * pg_stat_progress_basebackup.
+ * If the entry in statbuf is a link, then adjust statbuf to make it look like a
+ * directory, so that it will be written that way.
*/
static void
-update_basebackup_progress(int64 delta)
+convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
{
- const int index[] = {
- PROGRESS_BASEBACKUP_BACKUP_STREAMED,
- PROGRESS_BASEBACKUP_BACKUP_TOTAL
- };
- int64 val[2];
- int nparam = 0;
-
- backup_streamed += delta;
- val[nparam++] = backup_streamed;
-
- /*
- * Avoid overflowing past 100% or the full size. This may make the total
- * size number change as we approach the end of the backup (the estimate
- * will always be wrong if WAL is included), but that's better than having
- * the done column be bigger than the total.
- */
- if (backup_total > -1 && backup_streamed > backup_total)
- {
- backup_total = backup_streamed;
- val[nparam++] = backup_total;
- }
-
- pgstat_progress_update_multi_param(nparam, index, val);
+ /* If symlink, write it as a directory anyway */
+#ifndef WIN32
+ if (S_ISLNK(statbuf->st_mode))
+#else
+ if (pgwin32_is_junction(pathbuf))
+#endif
+ statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
}
/*
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
new file mode 100644
index 0000000000..30bab4546e
--- /dev/null
+++ b/src/backend/replication/basebackup_copy.c
@@ -0,0 +1,335 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_copy.c
+ * send basebackup archives using one COPY OUT operation per
+ * tablespace, and an additional COPY OUT for the backup manifest
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_copy.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_type_d.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+
+static void bbsink_copytblspc_begin_backup(bbsink *sink);
+static void bbsink_copytblspc_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copytblspc_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_archive(bbsink *sink);
+static void bbsink_copytblspc_begin_manifest(bbsink *sink);
+static void bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copytblspc_end_manifest(bbsink *sink);
+static void bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+static void bbsink_copytblspc_cleanup(bbsink *sink);
+
+static void SendCopyOutResponse(void);
+static void SendCopyData(const char *data, size_t len);
+static void SendCopyDone(void);
+static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
+static void SendTablespaceList(List *tablespaces);
+static void send_int8_string(StringInfoData *buf, int64 intval);
+
+const bbsink_ops bbsink_copytblspc_ops = {
+ .begin_backup = bbsink_copytblspc_begin_backup,
+ .begin_archive = bbsink_copytblspc_begin_archive,
+ .archive_contents = bbsink_copytblspc_archive_contents,
+ .end_archive = bbsink_copytblspc_end_archive,
+ .begin_manifest = bbsink_copytblspc_begin_manifest,
+ .manifest_contents = bbsink_copytblspc_manifest_contents,
+ .end_manifest = bbsink_copytblspc_end_manifest,
+ .end_backup = bbsink_copytblspc_end_backup,
+ .cleanup = bbsink_copytblspc_cleanup
+};
+
+/*
+ * Create a new 'copytblspc' bbsink.
+ */
+bbsink *
+bbsink_copytblspc_new(void)
+{
+ bbsink *sink = palloc0(sizeof(bbsink));
+
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_copytblspc_ops;
+
+ return sink;
+}
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_copytblspc_begin_backup(bbsink *sink)
+{
+ bbsink_state *state = sink->bbs_state;
+
+ /* Create a suitable buffer. */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of data within the archive is sent as a CopyData message.
+ */
+static void
+bbsink_copytblspc_archive_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * The archive is terminated by a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_archive(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * The backup manifest is sent as a separate stream of COPY data, and thus
+ * begins with a CopyOutResponse message.
+ */
+static void
+bbsink_copytblspc_begin_manifest(bbsink *sink)
+{
+ SendCopyOutResponse();
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copytblspc_manifest_contents(bbsink *sink, size_t len)
+{
+ SendCopyData(sink->bbs_buffer, len);
+}
+
+/*
+ * When we've finished sending the manifest, send a CopyDone message.
+ */
+static void
+bbsink_copytblspc_end_manifest(bbsink *sink)
+{
+ SendCopyDone();
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copytblspc_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Cleanup.
+ */
+static void
+bbsink_copytblspc_cleanup(bbsink *sink)
+{
+ /* Nothing to do. */
+}
+
+/*
+ * Send a CopyOutResponse message.
+ */
+static void
+SendCopyOutResponse(void)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'H');
+ pq_sendbyte(&buf, 0); /* overall format */
+ pq_sendint16(&buf, 0); /* natts */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message.
+ */
+static void
+SendCopyData(const char *data, size_t len)
+{
+ pq_putmessage('d', data, len);
+}
+
+/*
+ * Send a CopyDone message.
+ */
+static void
+SendCopyDone(void)
+{
+ pq_putemptymessage('c');
+}
+
+/*
+ * Send a single resultset containing just a single
+ * XLogRecPtr record (in text format)
+ */
+static void
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+{
+ StringInfoData buf;
+ char str[MAXFNAMELEN];
+ Size len;
+
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 2); /* 2 fields */
+
+ /* Field headers */
+ pq_sendstring(&buf, "recptr");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, TEXTOID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ pq_sendstring(&buf, "tli");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+
+ /*
+ * int8 may seem like a surprising data type for this, but in theory int4
+ * would not be wide enough for this, as TimeLineID is unsigned.
+ */
+ pq_sendint32(&buf, INT8OID); /* type oid */
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ /* Data row */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 2); /* number of columns */
+
+ len = snprintf(str, sizeof(str),
+ "%X/%X", LSN_FORMAT_ARGS(ptr));
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ len = snprintf(str, sizeof(str), "%u", tli);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, str, len);
+
+ pq_endmessage(&buf);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+}
+
+/*
+ * Send a result set via libpq describing the tablespace list.
+ */
+static void
+SendTablespaceList(List *tablespaces)
+{
+ StringInfoData buf;
+ ListCell *lc;
+
+ /* Construct and send the directory information */
+ pq_beginmessage(&buf, 'T'); /* RowDescription */
+ pq_sendint16(&buf, 3); /* 3 fields */
+
+ /* First field - spcoid */
+ pq_sendstring(&buf, "spcoid");
+ pq_sendint32(&buf, 0); /* table oid */
+ pq_sendint16(&buf, 0); /* attnum */
+ pq_sendint32(&buf, OIDOID); /* type oid */
+ pq_sendint16(&buf, 4); /* typlen */
+ pq_sendint32(&buf, 0); /* typmod */
+ pq_sendint16(&buf, 0); /* format code */
+
+ /* Second field - spclocation */
+ pq_sendstring(&buf, "spclocation");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, TEXTOID);
+ pq_sendint16(&buf, -1);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+
+ /* Third field - size */
+ pq_sendstring(&buf, "size");
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_sendint32(&buf, INT8OID);
+ pq_sendint16(&buf, 8);
+ pq_sendint32(&buf, 0);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+
+ foreach(lc, tablespaces)
+ {
+ tablespaceinfo *ti = lfirst(lc);
+
+ /* Send one datarow message */
+ pq_beginmessage(&buf, 'D');
+ pq_sendint16(&buf, 3); /* number of columns */
+ if (ti->path == NULL)
+ {
+ pq_sendint32(&buf, -1); /* Length = -1 ==> NULL */
+ pq_sendint32(&buf, -1);
+ }
+ else
+ {
+ Size len;
+
+ len = strlen(ti->oid);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->oid, len);
+
+ len = strlen(ti->path);
+ pq_sendint32(&buf, len);
+ pq_sendbytes(&buf, ti->path, len);
+ }
+ if (ti->size >= 0)
+ send_int8_string(&buf, ti->size / 1024);
+ else
+ pq_sendint32(&buf, -1); /* NULL */
+
+ pq_endmessage(&buf);
+ }
+}
+
+/*
+ * Send a 64-bit integer as a string via the wire protocol.
+ */
+static void
+send_int8_string(StringInfoData *buf, int64 intval)
+{
+ char is[32];
+
+ sprintf(is, INT64_FORMAT, intval);
+ pq_sendint32(buf, strlen(is));
+ pq_sendbytes(buf, is, strlen(is));
+}
diff --git a/src/backend/replication/basebackup_progress.c b/src/backend/replication/basebackup_progress.c
new file mode 100644
index 0000000000..e1a196251e
--- /dev/null
+++ b/src/backend/replication/basebackup_progress.c
@@ -0,0 +1,246 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_progress.c
+ * Basebackup sink implementing progress tracking, including but not
+ * limited to command progress reporting.
+ *
+ * This should be used even if the PROGRESS option to the replication
+ * command BASE_BACKUP is not specified. Without that option, we won't
+ * have tallied up the size of the files that are going to need to be
+ * backed up, but we can still report to the command progress reporting
+ * facility how much data we've processed.
+ *
+ * Moreover, we also use this as a convenient place to update certain
+ * fields of the bbsink_state. That work is accurately described as
+ * keeping track of our progress, but it's not just for introspection.
+ * We need those fields to be updated properly in order for base backups
+ * to work.
+ *
+ * This particular basebackup sink requires extra callbacks that most base
+ * backup sinks don't. Rather than cramming those into the interface, we just
+ * have a few extra functions here that basebackup.c can call. (We could put
+ * the logic directly into that file as it's fairly simple, but it seems
+ * cleaner to have everything related to progress reporting in one place.)
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_progress.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "commands/progress.h"
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+static void bbsink_progress_begin_backup(bbsink *sink);
+static void bbsink_progress_archive_contents(bbsink *sink, size_t len);
+static void bbsink_progress_end_archive(bbsink *sink);
+
+const bbsink_ops bbsink_progress_ops = {
+ .begin_backup = bbsink_progress_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_progress_archive_contents,
+ .end_archive = bbsink_progress_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_forward_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * Create a new basebackup sink that performs progress tracking functions and
+ * forwards data to a successor sink.
+ */
+bbsink *
+bbsink_progress_new(bbsink *next, bool estimate_backup_size)
+{
+ bbsink *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink));
+ *((const bbsink_ops **) &sink->bbs_ops) = &bbsink_progress_ops;
+ sink->bbs_next = next;
+
+ /*
+ * Report that a base backup is in progress, and set the total size of the
+ * backup to -1, which will get translated to NULL. If we're estimating
+ * the backup size, we'll insert the real estimate when we have it.
+ */
+ pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL, -1);
+
+ return sink;
+}
+
+/*
+ * Progress reporting at start of backup.
+ */
+static void
+bbsink_progress_begin_backup(bbsink *sink)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+ PROGRESS_BASEBACKUP_TBLSPC_TOTAL
+ };
+ int64 val[3];
+
+ /*
+ * Report that we are now streaming database files as a base backup. Also
+ * advertise the number of tablespaces, and, if known, the estimated total
+ * backup size.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP;
+ if (sink->bbs_state->bytes_total_is_valid)
+ val[1] = sink->bbs_state->bytes_total;
+ else
+ val[1] = -1;
+ val[2] = list_length(sink->bbs_state->tablespaces);
+ pgstat_progress_update_multi_param(3, index, val);
+
+ /* Delegate to next sink. */
+ bbsink_forward_begin_backup(sink);
+}
+
+/*
+ * End-of archive progress reporting.
+ */
+static void
+bbsink_progress_end_archive(bbsink *sink)
+{
+ /*
+ * We expect one archive per tablespace, so reaching the end of an archive
+ * also means reaching the end of a tablespace. (Some day we might have a
+ * reason to decouple these concepts.)
+ *
+ * If WAL is included in the backup, we'll mark the last tablespace
+ * complete before the last archive is complete, so we need a guard here
+ * to ensure that the number of tablespaces streamed doesn't exceed the
+ * total.
+ */
+ if (sink->bbs_state->tablespace_num < list_length(sink->bbs_state->tablespaces))
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_TBLSPC_STREAMED,
+ sink->bbs_state->tablespace_num + 1);
+
+ /* Delegate to next sink. */
+ bbsink_forward_end_archive(sink);
+
+ /*
+ * This is a convenient place to update the bbsink_state's notion of which
+ * is the current tablespace. Note that the bbsink_state object is shared
+ * across all bbsink objects involved, but we're the outermost one and
+ * this is the very last thing we do.
+ */
+ sink->bbs_state->tablespace_num++;
+}
+
+/*
+ * Handle progress tracking for new archive contents.
+ *
+ * Increment the counter for the amount of data already streamed
+ * by the given number of bytes, and update the progress report for
+ * pg_stat_progress_basebackup.
+ */
+static void
+bbsink_progress_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_state *state = sink->bbs_state;
+ const int index[] = {
+ PROGRESS_BASEBACKUP_BACKUP_STREAMED,
+ PROGRESS_BASEBACKUP_BACKUP_TOTAL
+ };
+ int64 val[2];
+ int nparam = 0;
+
+ /* First update bbsink_state with # of bytes done. */
+ state->bytes_done += len;
+
+ /* Now forward to next sink. */
+ bbsink_forward_archive_contents(sink, len);
+
+ /* Prepare to set # of bytes done for command progress reporting. */
+ val[nparam++] = state->bytes_done;
+
+ /*
+ * We may also want to update # of total bytes, to avoid overflowing past
+ * 100% or the full size. This may make the total size number change as we
+ * approach the end of the backup (the estimate will always be wrong if
+ * WAL is included), but that's better than having the done column be
+ * bigger than the total.
+ */
+ if (state->bytes_total_is_valid && state->bytes_done > state->bytes_total)
+ val[nparam++] = state->bytes_done;
+
+ pgstat_progress_update_multi_param(nparam, index, val);
+}
+
+/*
+ * Advertise that we are waiting for the start-of-backup checkpoint.
+ */
+void
+basebackup_progress_wait_checkpoint(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT);
+}
+
+/*
+ * Advertise that we are estimating the backup size.
+ */
+void
+basebackup_progress_estimate_backup_size(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE);
+}
+
+/*
+ * Advertise that we are waiting for WAL archiving at end-of-backup.
+ */
+void
+basebackup_progress_wait_wal_archive(bbsink_state *state)
+{
+ const int index[] = {
+ PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_TBLSPC_STREAMED
+ };
+ int64 val[2];
+
+ /*
+ * We report having finished all tablespaces at this point, even if the
+ * archive for the main tablespace is still open, because what's going to
+ * be added is WAL files, not files that are really from the main
+ * tablespace.
+ */
+ val[0] = PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE;
+ val[1] = list_length(state->tablespaces);
+ pgstat_progress_update_multi_param(2, index, val);
+}
+
+/*
+ * Advertise that we are transferring WAL files into the final archive.
+ */
+void
+basebackup_progress_transfer_wal(void)
+{
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+ PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL);
+}
+
+/*
+ * Advertise that we are no longer performing a backup.
+ */
+void
+basebackup_progress_done(void)
+{
+ pgstat_progress_end_command();
+}
diff --git a/src/backend/replication/basebackup_sink.c b/src/backend/replication/basebackup_sink.c
new file mode 100644
index 0000000000..4a47854f81
--- /dev/null
+++ b/src/backend/replication/basebackup_sink.c
@@ -0,0 +1,125 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.c
+ * Default implementations for bbsink (basebackup sink) callbacks.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/replication/basebackup_sink.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "replication/basebackup_sink.h"
+
+/*
+ * Forward begin_backup callback.
+ *
+ * Only use this implementation if you want the bbsink you're implementing to
+ * share a buffer with the succesor bbsink.
+ */
+void
+bbsink_forward_begin_backup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_state != NULL);
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+ sink->bbs_buffer = sink->bbs_next->bbs_buffer;
+}
+
+/*
+ * Forward begin_archive callback.
+ */
+void
+bbsink_forward_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, archive_name);
+}
+
+/*
+ * Forward archive_contents callback.
+ *
+ * Code that wants to use this should initalize its own bbs_buffer and
+ * bbs_buffer_length fields to the values from the successor sink. In cases
+ * where the buffer isn't shared, the data needs to be copied before forwarding
+ * the callback. We don't do try to do that here, because there's really no
+ * reason to have separately allocated buffers containing the same identical
+ * data.
+ */
+void
+bbsink_forward_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_archive_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_archive callback.
+ */
+void
+bbsink_forward_end_archive(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_archive(sink->bbs_next);
+}
+
+/*
+ * Forward begin_manifest callback.
+ */
+void
+bbsink_forward_begin_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward manifest_contents callback.
+ *
+ * As with the archive_contents callback, it's expected that the buffer is
+ * shared.
+ */
+void
+bbsink_forward_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink->bbs_next != NULL);
+ Assert(sink->bbs_buffer == sink->bbs_next->bbs_buffer);
+ Assert(sink->bbs_buffer_length == sink->bbs_next->bbs_buffer_length);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Forward end_manifest callback.
+ */
+void
+bbsink_forward_end_manifest(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_manifest(sink->bbs_next);
+}
+
+/*
+ * Forward end_backup callback.
+ */
+void
+bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_end_backup(sink->bbs_next, endptr, endtli);
+}
+
+/*
+ * Forward cleanup callback.
+ */
+void
+bbsink_forward_cleanup(bbsink *sink)
+{
+ Assert(sink->bbs_next != NULL);
+ bbsink_cleanup(sink->bbs_next);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
new file mode 100644
index 0000000000..f163931f8a
--- /dev/null
+++ b/src/backend/replication/basebackup_throttle.c
@@ -0,0 +1,199 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_throttle.c
+ * Basebackup sink implementing throttling. Data is forwarded to the
+ * next base backup sink in the chain at a rate no greater than the
+ * configured maximum.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_throttle.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup_sink.h"
+#include "pgstat.h"
+#include "storage/latch.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_throttle
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* The actual number of bytes, transfer of which may cause sleep. */
+ uint64 throttling_sample;
+
+ /* Amount of data already transferred but not yet throttled. */
+ int64 throttling_counter;
+
+ /* The minimum time required to transfer throttling_sample bytes. */
+ TimeOffset elapsed_min_unit;
+
+ /* The last check of the transfer rate. */
+ TimestampTz throttled_last;
+} bbsink_throttle;
+
+static void bbsink_throttle_begin_backup(bbsink *sink);
+static void bbsink_throttle_archive_contents(bbsink *sink, size_t len);
+static void bbsink_throttle_manifest_contents(bbsink *sink, size_t len);
+static void throttle(bbsink_throttle *sink, size_t increment);
+
+const bbsink_ops bbsink_throttle_ops = {
+ .begin_backup = bbsink_throttle_begin_backup,
+ .begin_archive = bbsink_forward_begin_archive,
+ .archive_contents = bbsink_throttle_archive_contents,
+ .end_archive = bbsink_forward_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_throttle_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * How frequently to throttle, as a fraction of the specified rate-second.
+ */
+#define THROTTLING_FREQUENCY 8
+
+/*
+ * Create a new basebackup sink that performs throttling and forwards data
+ * to a successor sink.
+ */
+bbsink *
+bbsink_throttle_new(bbsink *next, uint32 maxrate)
+{
+ bbsink_throttle *sink;
+
+ Assert(next != NULL);
+ Assert(maxrate > 0);
+
+ sink = palloc0(sizeof(bbsink_throttle));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_throttle_ops;
+ sink->base.bbs_next = next;
+
+ sink->throttling_sample =
+ (int64) maxrate * (int64) 1024 / THROTTLING_FREQUENCY;
+
+ /*
+ * The minimum amount of time for throttling_sample bytes to be
+ * transferred.
+ */
+ sink->elapsed_min_unit = USECS_PER_SEC / THROTTLING_FREQUENCY;
+
+ return &sink->base;
+}
+
+/*
+ * There's no real work to do here, but we need to record the current time so
+ * that it can be used for future calculations.
+ */
+static void
+bbsink_throttle_begin_backup(bbsink *sink)
+{
+ bbsink_throttle *mysink = (bbsink_throttle *) sink;
+
+ bbsink_forward_begin_backup(sink);
+
+ /* The 'real data' starts now (header was ignored). */
+ mysink->throttled_last = GetCurrentTimestamp();
+}
+
+/*
+ * First throttle, and then pass archive contents to next sink.
+ */
+static void
+bbsink_throttle_archive_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * First throttle, and then pass manifest contents to next sink.
+ */
+static void
+bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
+{
+ throttle((bbsink_throttle *) sink, len);
+
+ bbsink_forward_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Increment the network transfer counter by the given number of bytes,
+ * and sleep if necessary to comply with the requested network transfer
+ * rate.
+ */
+static void
+throttle(bbsink_throttle *sink, size_t increment)
+{
+ TimeOffset elapsed_min;
+
+ Assert(sink->throttling_counter >= 0);
+
+ sink->throttling_counter += increment;
+ if (sink->throttling_counter < sink->throttling_sample)
+ return;
+
+ /* How much time should have elapsed at minimum? */
+ elapsed_min = sink->elapsed_min_unit *
+ (sink->throttling_counter / sink->throttling_sample);
+
+ /*
+ * Since the latch could be set repeatedly because of concurrently WAL
+ * activity, sleep in a loop to ensure enough time has passed.
+ */
+ for (;;)
+ {
+ TimeOffset elapsed,
+ sleep;
+ int wait_result;
+
+ /* Time elapsed since the last measurement (and possible wake up). */
+ elapsed = GetCurrentTimestamp() - sink->throttled_last;
+
+ /* sleep if the transfer is faster than it should be */
+ sleep = elapsed_min - elapsed;
+ if (sleep <= 0)
+ break;
+
+ ResetLatch(MyLatch);
+
+ /* We're eating a potentially set latch, so check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * (TAR_SEND_SIZE / throttling_sample * elapsed_min_unit) should be
+ * the maximum time to sleep. Thus the cast to long is safe.
+ */
+ wait_result = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ (long) (sleep / 1000),
+ WAIT_EVENT_BASE_BACKUP_THROTTLE);
+
+ if (wait_result & WL_LATCH_SET)
+ CHECK_FOR_INTERRUPTS();
+
+ /* Done waiting? */
+ if (wait_result & WL_TIMEOUT)
+ break;
+ }
+
+ /*
+ * As we work with integers, only whole multiple of throttling_sample was
+ * processed. The rest will be done during the next call of this function.
+ */
+ sink->throttling_counter %= sink->throttling_sample;
+
+ /*
+ * Time interval for the remaining amount and possible next increments
+ * starts now.
+ */
+ sink->throttled_last = GetCurrentTimestamp();
+}
diff --git a/src/include/replication/backup_manifest.h b/src/include/replication/backup_manifest.h
index 099108910c..16ed7eec9b 100644
--- a/src/include/replication/backup_manifest.h
+++ b/src/include/replication/backup_manifest.h
@@ -12,9 +12,9 @@
#ifndef BACKUP_MANIFEST_H
#define BACKUP_MANIFEST_H
-#include "access/xlogdefs.h"
#include "common/checksum_helper.h"
#include "pgtime.h"
+#include "replication/basebackup_sink.h"
#include "storage/buffile.h"
typedef enum manifest_option
@@ -47,7 +47,8 @@ extern void AddWALInfoToBackupManifest(backup_manifest_info *manifest,
XLogRecPtr startptr,
TimeLineID starttli, XLogRecPtr endptr,
TimeLineID endtli);
-extern void SendBackupManifest(backup_manifest_info *manifest);
+
+extern void SendBackupManifest(backup_manifest_info *manifest, bbsink *sink);
extern void FreeBackupManifest(backup_manifest_info *manifest);
#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
new file mode 100644
index 0000000000..e6c073c567
--- /dev/null
+++ b/src/include/replication/basebackup_sink.h
@@ -0,0 +1,296 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_sink.h
+ * API for filtering or sending to a final destination the archives
+ * produced by the base backup process
+ *
+ * Taking a base backup produces one archive per tablespace directory,
+ * plus a backup manifest unless that feature has been disabled. The
+ * goal of the backup process is to put those archives and that manifest
+ * someplace, possibly after postprocessing them in some way. A 'bbsink'
+ * is an object to which those archives, and the manifest if present,
+ * can be sent.
+ *
+ * In practice, there will be a chain of 'bbsink' objects rather than
+ * just one, with callbacks being forwarded from one to the next,
+ * possibly with modification. Each object is responsible for a
+ * single task e.g. command progress reporting, throttling, or
+ * communication with the client.
+ *
+ * Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_sink.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_SINK_H
+#define BASEBACKUP_SINK_H
+
+#include "access/xlog_internal.h"
+#include "nodes/pg_list.h"
+
+/* Forward declarations. */
+struct bbsink;
+struct bbsink_ops;
+typedef struct bbsink bbsink;
+typedef struct bbsink_ops bbsink_ops;
+
+/*
+ * Overall backup state shared by all bbsink objects for a backup.
+ *
+ * Before calling bbstate_begin_backup, caller must initiate a bbsink_state
+ * object which will last for the lifetime of the backup, and must thereafter
+ * update it as required before each new call to a bbsink method. The bbsink
+ * will retain a pointer to the state object and will consult it to understand
+ * the progress of the backup.
+ *
+ * 'tablespaces' is a list of tablespaceinfo objects. It must be set before
+ * calling bbstate_begin_backup() and must not be modified thereafter.
+ *
+ * 'tablespace_num' is the index of the current tablespace within the list
+ * stored in 'tablespaces'.
+ *
+ * 'bytes_done' is the number of bytes read so far from $PGDATA.
+ *
+ * 'bytes_total' is the total number of bytes estimated to be present in
+ * $PGDATA, if we have estimated this.
+ *
+ * 'bytes_total_is_valid' is true if and only if a proper estimate has been
+ * stored into 'bytes_total'.
+ *
+ * 'startptr' and 'starttli' identify the point in the WAL stream at which
+ * the backup began. They must be set before calling bbstate_begin_backup()
+ * and must not be modified thereafter.
+ */
+typedef struct bbsink_state
+{
+ List *tablespaces;
+ int tablespace_num;
+ uint64 bytes_done;
+ uint64 bytes_total;
+ bool bytes_total_is_valid;
+ XLogRecPtr startptr;
+ TimeLineID starttli;
+} bbsink_state;
+
+/*
+ * Common data for any type of basebackup sink.
+ *
+ * 'bbs_ops' is the relevant callback table.
+ *
+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.
+ *
+ * 'bbs_next' is a pointer to another bbsink to which this bbsink is
+ * forwarding some or all operations.
+ *
+ * 'bbs_state' is a pointer to the bbsink_state object for this backup.
+ * Every bbsink associated with this backup should point to the same
+ * underlying state object.
+ *
+ * In general it is expected that the values of these fields are set when
+ * a bbsink is created and that they do not change thereafter. It's OK
+ * to modify the data to which bbs_buffer or bbs_state point, but no changes
+ * should be made to the contents of this struct.
+ */
+struct bbsink
+{
+ const bbsink_ops *bbs_ops;
+ char *bbs_buffer;
+ size_t bbs_buffer_length;
+ bbsink *bbs_next;
+ bbsink_state *bbs_state;
+};
+
+/*
+ * Callbacks for a base backup sink.
+ *
+ * All of these callbacks are required. If a particular callback just needs to
+ * forward the call to sink->bbs_next, use bbsink_forward_<callback_name> as
+ * the callback.
+ *
+ * Callers should always invoke these callbacks via the bbsink_* inline
+ * functions rather than calling them directly.
+ */
+struct bbsink_ops
+{
+ /*
+ * This callback is invoked just once, at the very start of the backup. It
+ * must set bbs_buffer to point to a chunk of storage where at least
+ * bbs_buffer_length bytes of data can be written.
+ */
+ void (*begin_backup) (bbsink *sink);
+
+ /*
+ * For each archive transmitted to a bbsink, there will be one call to the
+ * begin_archive() callback, some number of calls to the
+ * archive_contents() callback, and then one call to the end_archive()
+ * callback.
+ *
+ * Before invoking the archive_contents() callback, the caller should copy
+ * a number of bytes equal to what will be passed as len into bbs_buffer,
+ * but not more than bbs_buffer_length.
+ *
+ * It's generally good if the buffer is as full as possible before the
+ * archive_contents() callback is invoked, but it's not worth expending
+ * extra cycles to make sure it's absolutely 100% full.
+ */
+ void (*begin_archive) (bbsink *sink, const char *archive_name);
+ void (*archive_contents) (bbsink *sink, size_t len);
+ void (*end_archive) (bbsink *sink);
+
+ /*
+ * If a backup manifest is to be transmitted to a bbsink, there will be
+ * one call to the begin_manifest() callback, some number of calls to the
+ * manifest_contents() callback, and then one call to the end_manifest()
+ * callback. These calls will occur after all archives are transmitted.
+ *
+ * The rules for invoking the manifest_contents() callback are the same as
+ * for the archive_contents() callback above.
+ */
+ void (*begin_manifest) (bbsink *sink);
+ void (*manifest_contents) (bbsink *sink, size_t len);
+ void (*end_manifest) (bbsink *sink);
+
+ /*
+ * This callback is invoked just once, after all archives and the manifest
+ * have been sent.
+ */
+ void (*end_backup) (bbsink *sink, XLogRecPtr endptr, TimeLineID endtli);
+
+ /*
+ * If a backup is aborted by an error, this callback is invoked before the
+ * bbsink object is destroyed, so that it can release any resources that
+ * would not be released automatically. If no error occurs, this callback
+ * is invoked after the end_backup callback.
+ */
+ void (*cleanup) (bbsink *sink);
+};
+
+/* Begin a backup. */
+static inline void
+bbsink_begin_backup(bbsink *sink, bbsink_state *state, int buffer_length)
+{
+ Assert(sink != NULL);
+
+ Assert(buffer_length > 0);
+
+ sink->bbs_state = state;
+ sink->bbs_buffer_length = buffer_length;
+ sink->bbs_ops->begin_backup(sink);
+
+ Assert(sink->bbs_buffer != NULL);
+ Assert((sink->bbs_buffer_length % BLCKSZ) == 0);
+}
+
+/* Begin an archive. */
+static inline void
+bbsink_begin_archive(bbsink *sink, const char *archive_name)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_archive(sink, archive_name);
+}
+
+/* Process some of the contents of an archive. */
+static inline void
+bbsink_archive_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /*
+ * The caller should make a reasonable attempt to fill the buffer before
+ * calling this function, so it shouldn't be completely empty. Nor should
+ * it be filled beyond capacity.
+ */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->archive_contents(sink, len);
+}
+
+/* Finish an archive. */
+static inline void
+bbsink_end_archive(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_archive(sink);
+}
+
+/* Begin the backup manifest. */
+static inline void
+bbsink_begin_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->begin_manifest(sink);
+}
+
+/* Process some of the manifest contents. */
+static inline void
+bbsink_manifest_contents(bbsink *sink, size_t len)
+{
+ Assert(sink != NULL);
+
+ /* See comments in bbsink_archive_contents. */
+ Assert(len > 0 && len <= sink->bbs_buffer_length);
+
+ sink->bbs_ops->manifest_contents(sink, len);
+}
+
+/* Finish the backup manifest. */
+static inline void
+bbsink_end_manifest(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->end_manifest(sink);
+}
+
+/* Finish a backup. */
+static inline void
+bbsink_end_backup(bbsink *sink, XLogRecPtr endptr, TimeLineID endtli)
+{
+ Assert(sink != NULL);
+ Assert(sink->bbs_state->tablespace_num == list_length(sink->bbs_state->tablespaces));
+
+ sink->bbs_ops->end_backup(sink, endptr, endtli);
+}
+
+/* Release resources before destruction. */
+static inline void
+bbsink_cleanup(bbsink *sink)
+{
+ Assert(sink != NULL);
+
+ sink->bbs_ops->cleanup(sink);
+}
+
+/* Forwarding callbacks. Use these to pass operations through to next sink. */
+extern void bbsink_forward_begin_backup(bbsink *sink);
+extern void bbsink_forward_begin_archive(bbsink *sink,
+ const char *archive_name);
+extern void bbsink_forward_archive_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_archive(bbsink *sink);
+extern void bbsink_forward_begin_manifest(bbsink *sink);
+extern void bbsink_forward_manifest_contents(bbsink *sink, size_t len);
+extern void bbsink_forward_end_manifest(bbsink *sink);
+extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+extern void bbsink_forward_cleanup(bbsink *sink);
+
+/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
+
+/* Extra interface functions for progress reporting. */
+extern void basebackup_progress_wait_checkpoint(void);
+extern void basebackup_progress_estimate_backup_size(void);
+extern void basebackup_progress_wait_wal_archive(bbsink_state *);
+extern void basebackup_progress_transfer_wal(void);
+extern void basebackup_progress_done(void);
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 40fbcddd20..bd9f2b62ef 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3769,3 +3769,7 @@ yyscan_t
z_stream
z_streamp
zic_t
+bbsink
+bbsink_ops
+bbsink_state
+bbsink_throttle
--
2.24.3 (Apple Git-128)
v8-0003-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v8-0003-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From 094593190b869d661167e95155bc0fcf7fe22943 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 15:41:43 -0400
Subject: [PATCH v8 3/5] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to suppor the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 61 ++-
src/backend/replication/basebackup_copy.c | 277 +++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
src/tools/pgindent/typedefs.list | 3 +
5 files changed, 731 insertions(+), 54 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 482872b45c..06ba23fca7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -81,6 +88,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt, bbsink *sink);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -374,7 +382,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(&state);
@@ -612,6 +623,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -679,8 +691,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -821,6 +835,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -866,8 +896,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- /* Create a basic basebackup sink. */
- sink = bbsink_copytblspc_new();
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
@@ -1696,6 +1733,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 30bab4546e..57183f4d46 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,52 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+static void bbsink_copystream_cleanup(bbsink *sink);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -38,6 +103,18 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup,
+ .cleanup = bbsink_copystream_cleanup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -50,6 +127,202 @@ const bbsink_ops bbsink_copytblspc_ops = {
.cleanup = bbsink_copytblspc_cleanup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins with
+ * a the type byte we're going to need, and then arrange things so that
+ * the data we're given will be written just after that type byte. That
+ * will allow us to ship the data with a single call to pq_putmessage and
+ * without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Cleanup.
+ */
+static void
+bbsink_copystream_cleanup(bbsink *sink)
+{
+ /* Nothing to do. */
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 67d01d8b6e..0a9eb8ca7e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -167,6 +177,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -978,10 +995,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1008,8 +1026,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1049,16 +1067,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1069,8 +1087,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1080,6 +1098,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1333,28 +1662,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1477,46 +1810,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index e6c073c567..36b9b76c5f 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,6 +282,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4846efbe10..8410829aef 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3769,7 +3769,10 @@ yyscan_t
z_stream
z_streamp
zic_t
+ArchiveStreamState
+backup_target_type
bbsink
+bbsink_copystream
bbsink_ops
bbsink_state
bbsink_throttle
--
2.24.3 (Apple Git-128)
v8-0002-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v8-0002-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload
From 664780ea445f57c233fb3d2e0df2f3168db863ff Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 15:41:41 -0400
Subject: [PATCH v8 2/5] Introduce 'bbstreamer' abstraction to modularize
pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
---
src/bin/pg_basebackup/Makefile | 12 +-
src/bin/pg_basebackup/bbstreamer.h | 217 +++++
src/bin/pg_basebackup/bbstreamer_file.c | 579 ++++++++++++++
src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 912 +++++-----------------
src/tools/pgindent/typedefs.list | 10 +
7 files changed, 1697 insertions(+), 727 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer.h
create mode 100644 src/bin/pg_basebackup/bbstreamer_file.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_inject.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_tar.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..8fda09dcd4 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -34,10 +34,16 @@ OBJS = \
streamutil.o \
walmethods.o
+BBOBJS = \
+ pg_basebackup.o \
+ bbstreamer_file.o \
+ bbstreamer_inject.o \
+ bbstreamer_tar.o
+
all: pg_basebackup pg_receivewal pg_recvlogical
-pg_basebackup: pg_basebackup.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
- $(CC) $(CFLAGS) pg_basebackup.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
+ $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) pg_receivewal.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
@@ -60,7 +66,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f pg_basebackup$(X) pg_receivewal$(X) pg_recvlogical$(X) \
- pg_basebackup.o pg_receivewal.o pg_recvlogical.o \
+ $(BBOBJS) pg_receivewal.o pg_recvlogical.o \
$(OBJS)
rm -rf tmp_check
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
new file mode 100644
index 0000000000..b24dc848c1
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -0,0 +1,217 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer.h
+ *
+ * Each tar archive returned by the server is passed to one or more
+ * bbstreamer objects for further processing. The bbstreamer may do
+ * something simple, like write the archive to a file, perhaps after
+ * compressing it, but it can also do more complicated things, like
+ * annotating the byte stream to indicate which parts of the data
+ * correspond to tar headers or trailing padding, vs. which parts are
+ * payload data. A subsequent bbstreamer may use this information to
+ * make further decisions about how to process the data; for example,
+ * it might choose to modify the archive contents.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BBSTREAMER_H
+#define BBSTREAMER_H
+
+#include "lib/stringinfo.h"
+#include "pqexpbuffer.h"
+
+struct bbstreamer;
+struct bbstreamer_ops;
+typedef struct bbstreamer bbstreamer;
+typedef struct bbstreamer_ops bbstreamer_ops;
+
+/*
+ * Each chunk of archive data passed to a bbstreamer is classified into one
+ * of these categories. When data is first received from the remote server,
+ * each chunk will be categorized as BBSTREAMER_UNKNOWN, and the chunks will
+ * be of whatever size the remote server chose to send.
+ *
+ * If the archive is parsed (e.g. see bbstreamer_tar_parser_new()), then all
+ * chunks should be labelled as one of the other types listed here. In
+ * addition, there should be exactly one BBSTREAMER_MEMBER_HEADER chunk and
+ * exactly one BBSTREAMER_MEMBER_TRAILER chunk per archive member, even if
+ * that means a zero-length call. There can be any number of
+ * BBSTREAMER_MEMBER_CONTENTS chunks in between those calls. There
+ * should exactly BBSTREAMER_ARCHIVE_TRAILER chunk, and it should follow the
+ * last BBSTREAMER_MEMBER_TRAILER chunk.
+ *
+ * In theory, we could need other classifications here, such as a way of
+ * indicating an archive header, but the "tar" format doesn't need anything
+ * else, so for the time being there's no point.
+ */
+typedef enum
+{
+ BBSTREAMER_UNKNOWN,
+ BBSTREAMER_MEMBER_HEADER,
+ BBSTREAMER_MEMBER_CONTENTS,
+ BBSTREAMER_MEMBER_TRAILER,
+ BBSTREAMER_ARCHIVE_TRAILER
+} bbstreamer_archive_context;
+
+/*
+ * Each chunk of data that is classified as BBSTREAMER_MEMBER_HEADER,
+ * BBSTREAMER_MEMBER_CONTENTS, or BBSTREAMER_MEMBER_TRAILER should also
+ * pass a pointer to an instance of this struct. The details are expected
+ * to be present in the archive header and used to fill the struct, after
+ * which all subsequent calls for the same archive member are expected to
+ * pass the same details.
+ */
+typedef struct
+{
+ char pathname[MAXPGPATH];
+ pgoff_t size;
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ bool is_directory;
+ bool is_link;
+ char linktarget[MAXPGPATH];
+} bbstreamer_member;
+
+/*
+ * Generally, each type of bbstreamer will define its own struct, but the
+ * first element should be 'bbstreamer base'. A bbstreamer that does not
+ * require any additional private data could use this structure directly.
+ *
+ * bbs_ops is a pointer to the bbstreamer_ops object which contains the
+ * function pointers appropriate to this type of bbstreamer.
+ *
+ * bbs_next is a pointer to the successor bbstreamer, for those types of
+ * bbstreamer which forward data to a successor. It need not be used and
+ * should be set to NULL when not relevant.
+ *
+ * bbs_buffer is a buffer for accumulating data for temporary storage. Each
+ * type of bbstreamer makes its own decisions about whether and how to use
+ * this buffer.
+ */
+struct bbstreamer
+{
+ const bbstreamer_ops *bbs_ops;
+ bbstreamer *bbs_next;
+ StringInfoData bbs_buffer;
+};
+
+/*
+ * There are three callbacks for a bbstreamer. The 'content' callback is
+ * called repeatedly, as described in the bbstreamer_archive_context comments.
+ * Then, the 'finalize' callback is called once at the end, to give the
+ * bbstreamer a chance to perform cleanup such as closing files. Finally,
+ * because this code is running in a frontend environment where, as of this
+ * writing, there are no memory contexts, the 'free' callback is called to
+ * release memory. These callbacks should always be invoked using the static
+ * inline functions defined below.
+ */
+struct bbstreamer_ops
+{
+ void (*content) (bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+ void (*finalize) (bbstreamer *streamer);
+ void (*free) (bbstreamer *streamer);
+};
+
+/* Send some content to a bbstreamer. */
+static inline void
+bbstreamer_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->content(streamer, member, data, len, context);
+}
+
+/* Finalize a bbstreamer. */
+static inline void
+bbstreamer_finalize(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->finalize(streamer);
+}
+
+/* Free a bbstreamer. */
+static inline void
+bbstreamer_free(bbstreamer *streamer)
+{
+ Assert(streamer != NULL);
+ streamer->bbs_ops->free(streamer);
+}
+
+/*
+ * This is a convenience method for use when implementing a bbstreamer; it is
+ * not for use by outside callers. It adds the amount of data specified by
+ * 'nbytes' to the bbstreamer's buffer and adjusts '*len' and '*data'
+ * accordingly.
+ */
+static inline void
+bbstreamer_buffer_bytes(bbstreamer *streamer, const char **data, int *len,
+ int nbytes)
+{
+ Assert(nbytes <= *len);
+
+ appendBinaryStringInfo(&streamer->bbs_buffer, *data, nbytes);
+ *len -= nbytes;
+ *data += nbytes;
+}
+
+/*
+ * This is a convenence method for use when implementing a bbstreamer; it is
+ * not for use by outsider callers. It attempts to add enough data to the
+ * bbstreamer's buffer to reach a length of target_bytes and adjusts '*len'
+ * and '*data' accordingly. It returns true if the target length has been
+ * reached and false otherwise.
+ */
+static inline bool
+bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
+ int target_bytes)
+{
+ int buflen = streamer->bbs_buffer.len;
+
+ if (buflen >= target_bytes)
+ {
+ /* Target length already reached; nothing to do. */
+ return true;
+ }
+
+ if (buflen + *len < target_bytes)
+ {
+ /* Not enough data to reach target length; buffer all of it. */
+ bbstreamer_buffer_bytes(streamer, data, len, *len);
+ return false;
+ }
+
+ /* Buffer just enough to reach the target length. */
+ bbstreamer_buffer_bytes(streamer, data, len, target_bytes - buflen);
+ return true;
+}
+
+/*
+ * Functions for creating bbstreamer objects of various types. See the header
+ * comments for each of these functions for details.
+ */
+extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
+extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ int compresslevel);
+extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *));
+
+extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
+
+extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents);
+extern void bbstreamer_inject_file(bbstreamer *streamer, char *pathname,
+ char *data, int len);
+
+#endif
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
new file mode 100644
index 0000000000..03e1ea2550
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -0,0 +1,579 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_file.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_file.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include <unistd.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+typedef struct bbstreamer_plain_writer
+{
+ bbstreamer base;
+ char *pathname;
+ FILE *file;
+ bool should_close_file;
+} bbstreamer_plain_writer;
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+#endif
+
+typedef struct bbstreamer_extractor
+{
+ bbstreamer base;
+ char *basepath;
+ const char *(*link_map) (const char *);
+ void (*report_output_file) (const char *);
+ char filename[MAXPGPATH];
+ FILE *file;
+} bbstreamer_extractor;
+
+static void bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_plain_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_plain_writer_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_plain_writer_ops = {
+ .content = bbstreamer_plain_writer_content,
+ .finalize = bbstreamer_plain_writer_finalize,
+ .free = bbstreamer_plain_writer_free
+};
+
+#ifdef HAVE_LIBZ
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+#endif
+
+static void bbstreamer_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_extractor_free(bbstreamer *streamer);
+static void extract_directory(const char *filename, mode_t mode);
+static void extract_link(const char *filename, const char *linktarget);
+static FILE *create_file_for_extract(const char *filename, mode_t mode);
+
+const bbstreamer_ops bbstreamer_extractor_ops = {
+ .content = bbstreamer_extractor_content,
+ .finalize = bbstreamer_extractor_finalize,
+ .free = bbstreamer_extractor_free
+};
+
+/*
+ * Create a bbstreamer that just writes data to a file.
+ *
+ * The caller must specify a pathname and may specify a file. The pathname is
+ * used for error-reporting purposes either way. If file is NULL, the pathname
+ * also identifies the file to which the data should be written: it is opened
+ * for writing and closed when done. If file is not NULL, the data is written
+ * there.
+ */
+bbstreamer *
+bbstreamer_plain_writer_new(char *pathname, FILE *file)
+{
+ bbstreamer_plain_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_plain_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_plain_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+ streamer->file = file;
+
+ if (file == NULL)
+ {
+ streamer->file = fopen(pathname, "wb");
+ if (streamer->file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", pathname);
+ exit(1);
+ }
+ streamer->should_close_file = true;
+ }
+
+ return &streamer->base;
+}
+
+/*
+ * Write archive content to file.
+ */
+static void
+bbstreamer_plain_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a plain file consists of closing
+ * the file if we opened it, but not if the caller provided it.
+ */
+static void
+bbstreamer_plain_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ if (mystreamer->should_close_file && fclose(mystreamer->file) != 0)
+ {
+ pg_log_error("could not close file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->file = NULL;
+ mystreamer->should_close_file = false;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_plain_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_plain_writer *mystreamer;
+
+ mystreamer = (bbstreamer_plain_writer *) streamer;
+
+ Assert(!mystreamer->should_close_file);
+ Assert(mystreamer->base.bbs_next == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %s",
+ mystreamer->pathname,
+ get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a bbstreamer that extracts an archive.
+ *
+ * All pathnames in the archive are interpreted relative to basepath.
+ *
+ * Unlike e.g. bbstreamer_plain_writer_new() we can't do anything useful here
+ * with untyped chunks; we need typed chunks which follow the rules described
+ * in bbstreamer.h. Assuming we have that, we don't need to worry about the
+ * original archive format; it's enough to just look at the member information
+ * provided and write to the corresponding file.
+ *
+ * 'link_map' is a function that will be applied to the target of any
+ * symbolic link, and which should return a replacement pathname to be used
+ * in its place. If NULL, the symbolic link target is used without
+ * modification.
+ *
+ * 'report_output_file' is a function that will be called each time we open a
+ * new output file. The pathname to that file is passed as an argument. If
+ * NULL, the call is skipped.
+ */
+bbstreamer *
+bbstreamer_extractor_new(const char *basepath,
+ const char *(*link_map) (const char *),
+ void (*report_output_file) (const char *))
+{
+ bbstreamer_extractor *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_extractor_ops;
+ streamer->basepath = pstrdup(basepath);
+ streamer->link_map = link_map;
+ streamer->report_output_file = report_output_file;
+
+ return &streamer->base;
+}
+
+/*
+ * Extract archive contents to the filesystem.
+ */
+static void
+bbstreamer_extractor_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+ int fnamelen;
+
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ Assert(mystreamer->file == NULL);
+
+ /* Prepend basepath. */
+ snprintf(mystreamer->filename, sizeof(mystreamer->filename),
+ "%s/%s", mystreamer->basepath, member->pathname);
+
+ /* Remove any trailing slash. */
+ fnamelen = strlen(mystreamer->filename);
+ if (mystreamer->filename[fnamelen - 1] == '/')
+ mystreamer->filename[fnamelen - 1] = '\0';
+
+ /* Dispatch based on file type. */
+ if (member->is_directory)
+ extract_directory(mystreamer->filename, member->mode);
+ else if (member->is_link)
+ {
+ const char *linktarget = member->linktarget;
+
+ if (mystreamer->link_map)
+ linktarget = mystreamer->link_map(linktarget);
+ extract_link(mystreamer->filename, linktarget);
+ }
+ else
+ mystreamer->file =
+ create_file_for_extract(mystreamer->filename,
+ member->mode);
+
+ /* Report output file change. */
+ if (mystreamer->report_output_file)
+ mystreamer->report_output_file(mystreamer->filename);
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ if (mystreamer->file == NULL)
+ break;
+
+ errno = 0;
+ if (len > 0 && fwrite(data, len, 1, mystreamer->file) != 1)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ mystreamer->filename);
+ exit(1);
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ if (mystreamer->file == NULL)
+ break;
+ fclose(mystreamer->file);
+ mystreamer->file = NULL;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while extracting archive");
+ exit(1);
+ }
+}
+
+/*
+ * Create a directory.
+ */
+static void
+extract_directory(const char *filename, mode_t mode)
+{
+ if (mkdir(filename, pg_dir_create_mode) != 0)
+ {
+ /*
+ * When streaming WAL, pg_wal (or pg_xlog for pre-9.6 clusters) will
+ * have been created by the wal receiver process. Also, when the WAL
+ * directory location was specified, pg_wal (or pg_xlog) has already
+ * been created as a symbolic link before starting the actual backup.
+ * So just ignore creation failures on related directories.
+ */
+ if (!((pg_str_endswith(filename, "/pg_wal") ||
+ pg_str_endswith(filename, "/pg_xlog") ||
+ pg_str_endswith(filename, "/archive_status")) &&
+ errno == EEXIST))
+ {
+ pg_log_error("could not create directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on directory \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+}
+
+/*
+ * Create a symbolic link.
+ *
+ * It's most likely a link in pg_tblspc directory, to the location of a
+ * tablespace. Apply any tablespace mapping given on the command line
+ * (--tablespace-mapping). (We blindly apply the mapping without checking that
+ * the link really is inside pg_tblspc. We don't expect there to be other
+ * symlinks in a data directory, but if there are, you can call it an
+ * undocumented feature that you can map them too.)
+ */
+static void
+extract_link(const char *filename, const char *linktarget)
+{
+ if (symlink(linktarget, filename) != 0)
+ {
+ pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
+ filename, linktarget);
+ exit(1);
+ }
+}
+
+/*
+ * Create a regular file.
+ *
+ * Return the resulting handle so we can write the content to the file.
+ */
+static FILE *
+create_file_for_extract(const char *filename, mode_t mode)
+{
+ FILE *file;
+
+ file = fopen(filename, "wb");
+ if (file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m", filename);
+ exit(1);
+ }
+
+#ifndef WIN32
+ if (chmod(filename, mode))
+ {
+ pg_log_error("could not set permissions on file \"%s\": %m",
+ filename);
+ exit(1);
+ }
+#endif
+
+ return file;
+}
+
+/*
+ * End-of-stream processing for extracting an archive.
+ *
+ * There's nothing to do here but sanity checking.
+ */
+static void
+bbstreamer_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ Assert(mystreamer->file == NULL);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer;
+
+ pfree(mystreamer->basepath);
+ pfree(mystreamer);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_inject.c b/src/bin/pg_basebackup/bbstreamer_inject.c
new file mode 100644
index 0000000000..4d15251fdc
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_inject.c
@@ -0,0 +1,250 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_inject.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_inject.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "bbstreamer.h"
+#include "common/file_perm.h"
+#include "common/logging.h"
+
+typedef struct bbstreamer_recovery_injector
+{
+ bbstreamer base;
+ bool skip_file;
+ bool is_recovery_guc_supported;
+ bool is_postgresql_auto_conf;
+ bool found_postgresql_auto_conf;
+ PQExpBuffer recoveryconfcontents;
+ bbstreamer_member member;
+} bbstreamer_recovery_injector;
+
+static void bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_recovery_injector_finalize(bbstreamer *streamer);
+static void bbstreamer_recovery_injector_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_recovery_injector_ops = {
+ .content = bbstreamer_recovery_injector_content,
+ .finalize = bbstreamer_recovery_injector_finalize,
+ .free = bbstreamer_recovery_injector_free
+};
+
+/*
+ * Create a bbstreamer that can edit recoverydata into an archive stream.
+ *
+ * The input should be a series of typed chunks (not BBSTREAMER_UNKNOWN) as
+ * per the conventions described in bbstreamer.h; the chunks forwarded to
+ * the next bbstreamer will be similarly typed, but the
+ * BBSTREAMER_MEMBER_HEADER chunks may be zero-length in cases where we've
+ * edited the archive stream.
+ *
+ * Our goal is to do one of the following three things with the content passed
+ * via recoveryconfcontents: (1) if is_recovery_guc_supported is false, then
+ * put the content into recovery.conf, replacing any existing archive member
+ * by that name; (2) if is_recovery_guc_supported is true and
+ * postgresql.auto.conf exists in the archive, then append the content
+ * provided to the existing file; and (3) if is_recovery_guc_supported is
+ * true but postgresql.auto.conf does not exist in the archive, then create
+ * it with the specified content.
+ *
+ * In addition, if is_recovery_guc_supported is true, then we create a
+ * zero-length standby.signal file, dropping any file with that name from
+ * the archive.
+ */
+extern bbstreamer *
+bbstreamer_recovery_injector_new(bbstreamer *next,
+ bool is_recovery_guc_supported,
+ PQExpBuffer recoveryconfcontents)
+{
+ bbstreamer_recovery_injector *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_recovery_injector));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_recovery_injector_ops;
+ streamer->base.bbs_next = next;
+ streamer->is_recovery_guc_supported = is_recovery_guc_supported;
+ streamer->recoveryconfcontents = recoveryconfcontents;
+
+ return &streamer->base;
+}
+
+/*
+ * Handle each chunk of tar content while injecting recovery configuration.
+ */
+static void
+bbstreamer_recovery_injector_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_recovery_injector *mystreamer;
+
+ mystreamer = (bbstreamer_recovery_injector *) streamer;
+ Assert(member != NULL || context == BBSTREAMER_ARCHIVE_TRAILER);
+
+ switch (context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+ /* Must copy provided data so we have the option to modify it. */
+ memcpy(&mystreamer->member, member, sizeof(bbstreamer_member));
+
+ /*
+ * On v12+, skip standby.signal and edit postgresql.auto.conf; on
+ * older versions, skip recovery.conf.
+ */
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "standby.signal") == 0);
+ mystreamer->is_postgresql_auto_conf =
+ (strcmp(member->pathname, "postgresql.auto.conf") == 0);
+ if (mystreamer->is_postgresql_auto_conf)
+ {
+ /* Remember we saw it so we don't add it again. */
+ mystreamer->found_postgresql_auto_conf = true;
+
+ /* Increment length by data to be injected. */
+ mystreamer->member.size +=
+ mystreamer->recoveryconfcontents->len;
+
+ /*
+ * Zap data and len because the archive header is no
+ * longer valid; some subsequent bbstreamer must
+ * regenerate it if it's necessary.
+ */
+ data = NULL;
+ len = 0;
+ }
+ }
+ else
+ mystreamer->skip_file =
+ (strcmp(member->pathname, "recovery.conf") == 0);
+
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+ /* Do not forward if the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+ /* Do not forward it the file is to be skipped. */
+ if (mystreamer->skip_file)
+ return;
+
+ /* Append provided content to whatever we already sent. */
+ if (mystreamer->is_postgresql_auto_conf)
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+ if (mystreamer->is_recovery_guc_supported)
+ {
+ /*
+ * If we didn't already find (and thus modify)
+ * postgresql.auto.conf, inject it as an additional archive
+ * member now.
+ */
+ if (!mystreamer->found_postgresql_auto_conf)
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "postgresql.auto.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+
+ /* Inject empty standby.signal file. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "standby.signal", "", 0);
+ }
+ else
+ {
+ /* Inject recovery.conf file with specified contents. */
+ bbstreamer_inject_file(mystreamer->base.bbs_next,
+ "recovery.conf",
+ mystreamer->recoveryconfcontents->data,
+ mystreamer->recoveryconfcontents->len);
+ }
+
+ /* Nothing to do here. */
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while injecting recovery settings");
+ exit(1);
+ }
+
+ bbstreamer_content(mystreamer->base.bbs_next, &mystreamer->member,
+ data, len, context);
+}
+
+/*
+ * End-of-stream processing for this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_recovery_injector_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
+
+/*
+ * Inject a member into the archive with specified contents.
+ */
+void
+bbstreamer_inject_file(bbstreamer *streamer, char *pathname, char *data,
+ int len)
+{
+ bbstreamer_member member;
+
+ strlcpy(member.pathname, pathname, MAXPGPATH);
+ member.size = len;
+ member.mode = pg_file_create_mode;
+ member.is_directory = false;
+ member.is_link = false;
+ member.linktarget[0] = '\0';
+
+ /*
+ * There seems to be no principled argument for these values, but they are
+ * what PostgreSQL has historically used.
+ */
+ member.uid = 04000;
+ member.gid = 02000;
+
+ /*
+ * We don't know here how to generate valid member headers and trailers
+ * for the archiving format in use, so if those are needed, some successor
+ * bbstreamer will have to generate them using the data from 'member'.
+ */
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_HEADER);
+ bbstreamer_content(streamer, &member, data, len,
+ BBSTREAMER_MEMBER_CONTENTS);
+ bbstreamer_content(streamer, &member, NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
new file mode 100644
index 0000000000..5a9f587dca
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -0,0 +1,444 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_tar.c
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_tar.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "pgtar.h"
+
+typedef struct bbstreamer_tar_parser
+{
+ bbstreamer base;
+ bbstreamer_archive_context next_context;
+ bbstreamer_member member;
+ size_t file_bytes_sent;
+ size_t pad_bytes_expected;
+} bbstreamer_tar_parser;
+
+typedef struct bbstreamer_tar_archiver
+{
+ bbstreamer base;
+ bool rearchive_member;
+} bbstreamer_tar_archiver;
+
+static void bbstreamer_tar_parser_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_parser_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_parser_free(bbstreamer *streamer);
+static bool bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer);
+
+const bbstreamer_ops bbstreamer_tar_parser_ops = {
+ .content = bbstreamer_tar_parser_content,
+ .finalize = bbstreamer_tar_parser_finalize,
+ .free = bbstreamer_tar_parser_free
+};
+
+static void bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_archiver_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_archiver_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_archiver_ops = {
+ .content = bbstreamer_tar_archiver_content,
+ .finalize = bbstreamer_tar_archiver_finalize,
+ .free = bbstreamer_tar_archiver_free
+};
+
+/*
+ * Create a bbstreamer that can parse a stream of content as tar data.
+ *
+ * The input should be a series of BBSTREAMER_UNKNOWN chunks; the bbstreamer
+ * specified by 'next' will receive a series of typed chunks, as per the
+ * conventions described in bbstreamer.h.
+ */
+extern bbstreamer *
+bbstreamer_tar_parser_new(bbstreamer *next)
+{
+ bbstreamer_tar_parser *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_parser));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_parser_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->next_context = BBSTREAMER_MEMBER_HEADER;
+
+ return &streamer->base;
+}
+
+/*
+ * Parse unknown content as tar data.
+ */
+static void
+bbstreamer_tar_parser_content(bbstreamer *streamer, bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+ size_t nbytes;
+
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ while (len > 0)
+ {
+ switch (mystreamer->next_context)
+ {
+ case BBSTREAMER_MEMBER_HEADER:
+
+ /*
+ * If we're expecting an archive member header, accumulate a
+ * full block of data before doing anything further.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ TAR_BLOCK_SIZE))
+ return;
+
+ /*
+ * Now we can process the header and get ready to process the
+ * file contents; however, we might find out that what we
+ * thought was the next file header is actually the start of
+ * the archive trailer. Switch modes accordingly.
+ */
+ if (bbstreamer_tar_header(mystreamer))
+ {
+ if (mystreamer->member.size == 0)
+ {
+ /* No content; trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Expect contents. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_CONTENTS;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ mystreamer->file_bytes_sent = 0;
+ }
+ else
+ mystreamer->next_context = BBSTREAMER_ARCHIVE_TRAILER;
+ break;
+
+ case BBSTREAMER_MEMBER_CONTENTS:
+
+ /*
+ * Send as much content as we have, but not more than the
+ * remaining file length.
+ */
+ Assert(mystreamer->file_bytes_sent < mystreamer->member.size);
+ nbytes = mystreamer->member.size - mystreamer->file_bytes_sent;
+ nbytes = Min(nbytes, len);
+ Assert(nbytes > 0);
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, nbytes,
+ BBSTREAMER_MEMBER_CONTENTS);
+ mystreamer->file_bytes_sent += nbytes;
+ data += nbytes;
+ len -= nbytes;
+
+ /*
+ * If we've not yet sent the whole file, then there's more
+ * content to come; otherwise, it's time to expect the file
+ * trailer.
+ */
+ Assert(mystreamer->file_bytes_sent <= mystreamer->member.size);
+ if (mystreamer->file_bytes_sent == mystreamer->member.size)
+ {
+ if (mystreamer->pad_bytes_expected == 0)
+ {
+ /* Trailer is zero-length. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ NULL, 0,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ }
+ else
+ {
+ /* Trailer is not zero-length. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_TRAILER;
+ }
+ mystreamer->base.bbs_buffer.len = 0;
+ }
+ break;
+
+ case BBSTREAMER_MEMBER_TRAILER:
+
+ /*
+ * If we're expecting an archive member trailer, accumulate
+ * the expected number of padding bytes before sending
+ * anything onward.
+ */
+ if (!bbstreamer_buffer_until(streamer, &data, &len,
+ mystreamer->pad_bytes_expected))
+ return;
+
+ /* OK, now we can send it. */
+ bbstreamer_content(mystreamer->base.bbs_next,
+ &mystreamer->member,
+ data, mystreamer->pad_bytes_expected,
+ BBSTREAMER_MEMBER_TRAILER);
+
+ /* Expect next file header. */
+ mystreamer->next_context = BBSTREAMER_MEMBER_HEADER;
+ mystreamer->base.bbs_buffer.len = 0;
+ break;
+
+ case BBSTREAMER_ARCHIVE_TRAILER:
+
+ /*
+ * We've seen an end-of-archive indicator, so anything more is
+ * buffered and sent as part of the archive trailer. But we
+ * don't expect more than 2 blocks.
+ */
+ bbstreamer_buffer_bytes(streamer, &data, &len, len);
+ if (len > 2 * TAR_BLOCK_SIZE)
+ {
+ pg_log_error("tar file trailer exceeds 2 blocks");
+ exit(1);
+ }
+ return;
+
+ default:
+ /* Shouldn't happen. */
+ pg_log_error("unexpected state while parsing tar archive");
+ exit(1);
+ }
+ }
+}
+
+/*
+ * Parse a file header within a tar stream.
+ *
+ * The return value is true if we found a file header and passed it on to the
+ * next bbstreamer; it is false if we have reached the archive trailer.
+ */
+static bool
+bbstreamer_tar_header(bbstreamer_tar_parser *mystreamer)
+{
+ bool has_nonzero_byte = false;
+ int i;
+ bbstreamer_member *member = &mystreamer->member;
+ char *buffer = mystreamer->base.bbs_buffer.data;
+
+ Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
+
+ /* Check whether we've got a block of all zero bytes. */
+ for (i = 0; i < TAR_BLOCK_SIZE; ++i)
+ {
+ if (buffer[i] != '\0')
+ {
+ has_nonzero_byte = true;
+ break;
+ }
+ }
+
+ /*
+ * If the entire block was zeros, this is the end of the archive, not the
+ * start of the next file.
+ */
+ if (!has_nonzero_byte)
+ return false;
+
+ /*
+ * Parse key fields out of the header.
+ *
+ * FIXME: It's terrible that we use hard-coded values here instead of some
+ * more principled approach. It's been like this for a long time, but we
+ * ought to do better.
+ */
+ strlcpy(member->pathname, &buffer[0], MAXPGPATH);
+ if (member->pathname[0] == '\0')
+ {
+ pg_log_error("tar member has empty name");
+ exit(1);
+ }
+ member->size = read_tar_number(&buffer[124], 12);
+ member->mode = read_tar_number(&buffer[100], 8);
+ member->uid = read_tar_number(&buffer[108], 8);
+ member->gid = read_tar_number(&buffer[116], 8);
+ member->is_directory = (buffer[156] == '5');
+ member->is_link = (buffer[156] == '2');
+ if (member->is_link)
+ strlcpy(member->linktarget, &buffer[157], 100);
+
+ /* Compute number of padding bytes. */
+ mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
+
+ /* Forward the entire header to the next bbstreamer. */
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ buffer, TAR_BLOCK_SIZE,
+ BBSTREAMER_MEMBER_HEADER);
+
+ return true;
+}
+
+/*
+ * End-of-stream processing for a tar parser.
+ */
+static void
+bbstreamer_tar_parser_finalize(bbstreamer *streamer)
+{
+ bbstreamer_tar_parser *mystreamer = (bbstreamer_tar_parser *) streamer;
+
+ if (mystreamer->next_context != BBSTREAMER_ARCHIVE_TRAILER &&
+ (mystreamer->next_context != BBSTREAMER_MEMBER_HEADER ||
+ mystreamer->base.bbs_buffer.len > 0))
+ {
+ pg_log_error("COPY stream ended before last file was finished");
+ exit(1);
+ }
+
+ /* Send the archive trailer, even if empty. */
+ bbstreamer_content(streamer->bbs_next, NULL,
+ streamer->bbs_buffer.data, streamer->bbs_buffer.len,
+ BBSTREAMER_ARCHIVE_TRAILER);
+
+ /* Now finalize successor. */
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar parser.
+ */
+static void
+bbstreamer_tar_parser_free(bbstreamer *streamer)
+{
+ pfree(streamer->bbs_buffer.data);
+ bbstreamer_free(streamer->bbs_next);
+}
+
+/*
+ * Create an bbstreamer that can generate a tar archive.
+ *
+ * This is intended to be usable either for generating a brand-new tar archive
+ * or for modifying one on the fly. The input should be a series of typed
+ * chunks (i.e. not BBSTREAMER_UNKNOWN). See also the comments for
+ * bbstreamer_tar_parser_content.
+ */
+extern bbstreamer *
+bbstreamer_tar_archiver_new(bbstreamer *next)
+{
+ bbstreamer_tar_archiver *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_tar_archiver));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_tar_archiver_ops;
+ streamer->base.bbs_next = next;
+
+ return &streamer->base;
+}
+
+/*
+ * Fix up the stream of input chunks to create a valid tar file.
+ *
+ * If a BBSTREAMER_MEMBER_HEADER chunk is of size 0, it is replaced with a
+ * newly-constructed tar header. If it is of size TAR_BLOCK_SIZE, it is
+ * passed through without change. Any other size is a fatal error (and
+ * indicates a bug).
+ *
+ * Whenever a new BBSTREAMER_MEMBER_HEADER chunk is constructed, the
+ * corresponding BBSTREAMER_MEMBER_TRAILER chunk is also constructed from
+ * scratch. Specifically, we construct a block of zero bytes sufficient to
+ * pad out to a block boundary, as required by the tar format. Other
+ * BBSTREAMER_MEMBER_TRAILER chunks are passed through without change.
+ *
+ * Any BBSTREAMER_MEMBER_CONTENTS chunks are passed through without change.
+ *
+ * The BBSTREAMER_ARCHIVE_TRAILER chunk is replaced with two
+ * blocks of zero bytes. Not all tar programs require this, but apparently
+ * some do. The server does not supply this trailer. If no archive trailer is
+ * present, one will be added by bbstreamer_tar_parser_finalize.
+ */
+static void
+bbstreamer_tar_archiver_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_tar_archiver *mystreamer = (bbstreamer_tar_archiver *) streamer;
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ Assert(context != BBSTREAMER_UNKNOWN);
+
+ if (context == BBSTREAMER_MEMBER_HEADER && len != TAR_BLOCK_SIZE)
+ {
+ Assert(len == 0);
+
+ /* Replace zero-length tar header with a newly constructed one. */
+ tarCreateHeader(buffer, member->pathname, NULL,
+ member->size, member->mode, member->uid, member->gid,
+ time(NULL));
+ data = buffer;
+ len = TAR_BLOCK_SIZE;
+
+ /* Also make a note to replace padding, in case size changed. */
+ mystreamer->rearchive_member = true;
+ }
+ else if (context == BBSTREAMER_MEMBER_TRAILER &&
+ mystreamer->rearchive_member)
+ {
+ int pad_bytes = tarPaddingBytesRequired(member->size);
+
+ /* Also replace padding, if we regenerated the header. */
+ memset(buffer, 0, pad_bytes);
+ data = buffer;
+ len = pad_bytes;
+
+ /* Don't do this agian unless we replace another header. */
+ mystreamer->rearchive_member = false;
+ }
+ else if (context == BBSTREAMER_ARCHIVE_TRAILER)
+ {
+ /* Trailer should always be two blocks of zero bytes. */
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ data = buffer;
+ len = 2 * TAR_BLOCK_SIZE;
+ }
+
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * End-of-stream processing for a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_finalize(bbstreamer *streamer)
+{
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar archiver.
+ */
+static void
+bbstreamer_tar_archiver_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 27ee6394cf..67d01d8b6e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -28,18 +28,13 @@
#endif
#include "access/xlog_internal.h"
+#include "bbstreamer.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
-#include "common/string.h"
#include "fe_utils/option_utils.h"
#include "fe_utils/recovery_gen.h"
-#include "fe_utils/string_utils.h"
#include "getopt_long.h"
-#include "libpq-fe.h"
-#include "pgtar.h"
-#include "pgtime.h"
-#include "pqexpbuffer.h"
#include "receivelog.h"
#include "replication/basebackup.h"
#include "streamutil.h"
@@ -62,34 +57,9 @@ typedef struct TablespaceList
typedef struct WriteTarState
{
int tablespacenum;
- char filename[MAXPGPATH];
- FILE *tarfile;
- char tarhdr[TAR_BLOCK_SIZE];
- bool basetablespace;
- bool in_tarhdr;
- bool skip_file;
- bool is_recovery_guc_supported;
- bool is_postgresql_auto_conf;
- bool found_postgresql_auto_conf;
- int file_padding_len;
- size_t tarhdrsz;
- pgoff_t filesz;
-#ifdef HAVE_LIBZ
- gzFile ztarfile;
-#endif
+ bbstreamer *streamer;
} WriteTarState;
-typedef struct UnpackTarState
-{
- int tablespacenum;
- char current_path[MAXPGPATH];
- char filename[MAXPGPATH];
- const char *mapped_tblspc_path;
- pgoff_t current_len_left;
- int current_padding;
- FILE *file;
-} UnpackTarState;
-
typedef struct WriteManifestState
{
char filename[MAXPGPATH];
@@ -161,10 +131,11 @@ static bool found_existing_xlogdir = false;
static bool made_tablespace_dirs = false;
static bool found_tablespace_dirs = false;
-/* Progress counters */
+/* Progress indicators */
static uint64 totalsize_kb;
static uint64 totaldone;
static int tablespacecount;
+static const char *progress_filename;
/* Pipe to communicate with background wal receiver process */
#ifndef WIN32
@@ -190,14 +161,15 @@ static PQExpBuffer recoveryconfcontents = NULL;
/* Function headers */
static void usage(void);
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
-static void progress_report(int tablespacenum, const char *filename, bool force,
- bool finished);
-
-static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
+static void progress_update_filename(const char *filename);
+static void progress_report(int tablespacenum, bool force, bool finished);
+
+static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported);
+static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
-static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
-static void ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf,
- void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
void *callback_data);
@@ -360,21 +332,6 @@ tablespace_list_append(const char *arg)
}
-#ifdef HAVE_LIBZ
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
static void
usage(void)
{
@@ -763,6 +720,14 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
}
}
+/*
+ * Callback to update our notion of the current filename.
+ */
+static void
+progress_update_filename(const char *filename)
+{
+ progress_filename = filename;
+}
/*
* Print a progress report based on the global variables. If verbose output
@@ -775,8 +740,7 @@ verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
* is moved to the next line.
*/
static void
-progress_report(int tablespacenum, const char *filename,
- bool force, bool finished)
+progress_report(int tablespacenum, bool force, bool finished)
{
int percent;
char totaldone_str[32];
@@ -811,7 +775,7 @@ progress_report(int tablespacenum, const char *filename,
#define VERBOSE_FILENAME_LENGTH 35
if (verbose)
{
- if (!filename)
+ if (!progress_filename)
/*
* No filename given, so clear the status line (used for last
@@ -827,7 +791,7 @@ progress_report(int tablespacenum, const char *filename,
VERBOSE_FILENAME_LENGTH + 5, "");
else
{
- bool truncate = (strlen(filename) > VERBOSE_FILENAME_LENGTH);
+ bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
fprintf(stderr,
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
@@ -841,7 +805,7 @@ progress_report(int tablespacenum, const char *filename,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
/* Truncate filename at beginning if it's too long */
- truncate ? filename + strlen(filename) - VERBOSE_FILENAME_LENGTH + 3 : filename);
+ truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
}
}
else
@@ -987,257 +951,170 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
}
/*
- * Write a piece of tar data
+ * Figure out what to do with an archive received from the server based on
+ * the options selected by the user. We may just write the results directly
+ * to a file, or we might compress first, or we might extract the tar file
+ * and write each member separately. This function doesn't do any of that
+ * directly, but it works out what kind of bbstreamer we need to create so
+ * that the right stuff happens when, down the road, we actually receive
+ * the data.
*/
-static void
-writeTarData(WriteTarState *state, char *buf, int r)
+static bbstreamer *
+CreateBackupStreamer(char *archive_name, char *spclocation,
+ bbstreamer **manifest_inject_streamer_p,
+ bool is_recovery_guc_supported)
{
-#ifdef HAVE_LIBZ
- if (state->ztarfile != NULL)
- {
- errno = 0;
- if (gzwrite(state->ztarfile, buf, r) != r)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- state->filename, get_gz_error(state->ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- errno = 0;
- if (fwrite(buf, r, 1, state->tarfile) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-}
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer = NULL;
+ bool inject_manifest;
+ bool must_parse_archive;
-/*
- * Receive a tar format file from the connection to the server, and write
- * the data from this file directly into a tar file. If compression is
- * enabled, the data will be compressed while written to the file.
- *
- * The file will be named base.tar[.gz] if it's for the main data directory
- * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
- *
- * No attempt to inspect or validate the contents of the file is done.
- */
-static void
-ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- char zerobuf[TAR_BLOCK_SIZE * 2];
- WriteTarState state;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
- state.basetablespace = PQgetisnull(res, rownum, 0);
- state.in_tarhdr = true;
+ /*
+ * Normally, we emit the backup manifest as a separate file, but when
+ * we're writing a tarfile to stdout, we don't have that option, so
+ * include it in the one tarfile we've got.
+ */
+ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* recovery.conf is integrated into postgresql.conf in 12 and newer */
- if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC)
- state.is_recovery_guc_supported = true;
+ /*
+ * We have to parse the archive if (1) we're suppose to extract it, or if
+ * (2) we need to inject backup_manifest or recovery configuration into it.
+ */
+ must_parse_archive = (format == 'p' || inject_manifest ||
+ (spclocation == NULL && writerecoveryconf));
- if (state.basetablespace)
+ if (format == 'p')
{
+ const char *directory;
+
/*
- * Base tablespaces
+ * In plain format, we must extract the archive. The data for the main
+ * tablespace will be written to the base directory, and the data for
+ * other tablespaces will be written to the directory where they're
+ * located on the server, after applying any user-specified tablespace
+ * mappings.
*/
- if (strcmp(basedir, "-") == 0)
- {
-#ifdef WIN32
- _setmode(fileno(stdout), _O_BINARY);
-#endif
-
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- int fd = dup(fileno(stdout));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- state.ztarfile = gzdopen(fd, "wb");
- if (state.ztarfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
-
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- state.tarfile = stdout;
- strcpy(state.filename, "-");
- }
- else
- {
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar.gz", basedir);
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- snprintf(state.filename, sizeof(state.filename),
- "%s/base.tar", basedir);
- state.tarfile = fopen(state.filename, "wb");
- }
- }
+ directory = spclocation == NULL ? basedir
+ : get_tablespace_mapping(spclocation);
+ streamer = bbstreamer_extractor_new(directory,
+ get_tablespace_mapping,
+ progress_update_filename);
}
else
{
+ FILE *archive_file;
+ char archive_filename[MAXPGPATH];
+
/*
- * Specific tablespace
+ * In tar format, we just write the archive without extracting it.
+ * Normally, we write it to the archive name provided by the caller,
+ * but when the base directory is "-" that means we need to write
+ * to standard output.
*/
-#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ if (strcmp(basedir, "-") == 0)
{
- snprintf(state.filename, sizeof(state.filename),
- "%s/%s.tar.gz",
- basedir, PQgetvalue(res, rownum, 0));
- state.ztarfile = gzopen(state.filename, "wb");
- if (gzsetparams(state.ztarfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(state.ztarfile));
- exit(1);
- }
+ snprintf(archive_filename, sizeof(archive_filename), "-");
+ archive_file = stdout;
}
else
-#endif
{
- snprintf(state.filename, sizeof(state.filename), "%s/%s.tar",
- basedir, PQgetvalue(res, rownum, 0));
- state.tarfile = fopen(state.filename, "wb");
+ snprintf(archive_filename, sizeof(archive_filename),
+ "%s/%s", basedir, archive_name);
+ archive_file = NULL;
}
- }
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
- {
- if (!state.ztarfile)
+ if (compresslevel != 0)
{
- /* Compression is in use */
- pg_log_error("could not create compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
+ strlcat(archive_filename, ".gz", sizeof(archive_filename));
+ streamer = bbstreamer_gzip_writer_new(archive_filename,
+ archive_file,
+ compresslevel);
}
- }
- else
+ else
#endif
- {
- /* Either no zlib support, or zlib support but compresslevel = 0 */
- if (!state.tarfile)
- {
- pg_log_error("could not create file \"%s\": %m", state.filename);
- exit(1);
- }
- }
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
- ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+
+ /*
+ * If we need to parse the archive for whatever reason, then we'll
+ * also need to re-archive, because, if the output format is tar, the
+ * only point of parsing the archive is to be able to inject stuff
+ * into it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_archiver_new(streamer);
+ progress_filename = archive_filename;
+ }
/*
- * End of copy data. If requested, and this is the base tablespace, write
- * configuration file into the tarfile. When done, close the file (but not
- * stdout).
- *
- * Also, write two completely empty blocks at the end of the tar file, as
- * required by some tar programs.
+ * If we're supposed to inject the backup manifest into the results,
+ * it should be done here, so that the file content can be injected
+ * directly, without worrying about the details of the tar format.
*/
+ if (inject_manifest)
+ manifest_inject_streamer = streamer;
- MemSet(zerobuf, 0, sizeof(zerobuf));
-
- if (state.basetablespace && writerecoveryconf)
+ /*
+ * If this is the main tablespace and we're supposed to write
+ * recovery information, arrange to do that.
+ */
+ if (spclocation == NULL && writerecoveryconf)
{
- char header[TAR_BLOCK_SIZE];
+ Assert(must_parse_archive);
+ streamer = bbstreamer_recovery_injector_new(streamer,
+ is_recovery_guc_supported,
+ recoveryconfcontents);
+ }
- /*
- * If postgresql.auto.conf has not been found in the streamed data,
- * add recovery configuration to postgresql.auto.conf if recovery
- * parameters are GUCs. If the instance connected to is older than
- * 12, create recovery.conf with this data otherwise.
- */
- if (!state.found_postgresql_auto_conf || !state.is_recovery_guc_supported)
- {
- int padding;
-
- tarCreateHeader(header,
- state.is_recovery_guc_supported ? "postgresql.auto.conf" : "recovery.conf",
- NULL,
- recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- padding = tarPaddingBytesRequired(recoveryconfcontents->len);
-
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, recoveryconfcontents->data,
- recoveryconfcontents->len);
- if (padding)
- writeTarData(&state, zerobuf, padding);
- }
+ /*
+ * If we're doing anything that involves understanding the contents of
+ * the archive, we'll need to parse it.
+ */
+ if (must_parse_archive)
+ streamer = bbstreamer_tar_parser_new(streamer);
- /*
- * standby.signal is supported only if recovery parameters are GUCs.
- */
- if (state.is_recovery_guc_supported)
- {
- tarCreateHeader(header, "standby.signal", NULL,
- 0, /* zero-length file */
- pg_file_create_mode, 04000, 02000,
- time(NULL));
+ /* Return the results. */
+ *manifest_inject_streamer_p = manifest_inject_streamer;
+ return streamer;
+}
- writeTarData(&state, header, sizeof(header));
+/*
+ * Receive raw tar data from the server, and stream it to the appropriate
+ * location. If we're writing a single tarfile to standard output, also
+ * receive the backup manifest and inject it into that tarfile.
+ */
+static void
+ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
+ bool tablespacenum)
+{
+ WriteTarState state;
+ bbstreamer *manifest_inject_streamer;
+ bool is_recovery_guc_supported;
- /*
- * we don't need to pad out to a multiple of the tar block size
- * here, because the file is zero length, which is a multiple of
- * any block size.
- */
- }
- }
+ /* Pass all COPY data through to the backup streamer. */
+ memset(&state, 0, sizeof(state));
+ is_recovery_guc_supported =
+ PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ state.streamer = CreateBackupStreamer(archive_name, spclocation,
+ &manifest_inject_streamer,
+ is_recovery_guc_supported);
+ state.tablespacenum = tablespacenum;
+ ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
+ progress_filename = NULL;
/*
- * Normally, we emit the backup manifest as a separate file, but when
- * we're writing a tarfile to stdout, we don't have that option, so
- * include it in the one tarfile we've got.
+ * The decision as to whether we need to inject the backup manifest into
+ * the output at this stage is made by CreateBackupStreamer; if that is
+ * needed, manifest_inject_streamer will be non-NULL; otherwise, it will
+ * be NULL.
*/
- if (strcmp(basedir, "-") == 0 && manifest)
+ if (manifest_inject_streamer != NULL)
{
- char header[TAR_BLOCK_SIZE];
PQExpBufferData buf;
+ /* Slurp the entire backup manifest into a buffer. */
initPQExpBuffer(&buf);
ReceiveBackupManifestInMemory(conn, &buf);
if (PQExpBufferDataBroken(buf))
@@ -1245,42 +1122,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
pg_log_error("out of memory");
exit(1);
}
- tarCreateHeader(header, "backup_manifest", NULL, buf.len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
- writeTarData(&state, header, sizeof(header));
- writeTarData(&state, buf.data, buf.len);
- termPQExpBuffer(&buf);
- }
- /* 2 * TAR_BLOCK_SIZE bytes empty data at end of file */
- writeTarData(&state, zerobuf, sizeof(zerobuf));
+ /* Inject it into the output tarfile. */
+ bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
+ buf.data, buf.len);
-#ifdef HAVE_LIBZ
- if (state.ztarfile != NULL)
- {
- if (gzclose(state.ztarfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %s",
- state.filename, get_gz_error(state.ztarfile));
- exit(1);
- }
- }
- else
-#endif
- {
- if (strcmp(basedir, "-") != 0)
- {
- if (fclose(state.tarfile) != 0)
- {
- pg_log_error("could not close file \"%s\": %m",
- state.filename);
- exit(1);
- }
- }
+ /* Free memory. */
+ termPQExpBuffer(&buf);
}
- progress_report(rownum, state.filename, true, false);
+ /* Cleanup. */
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+
+ progress_report(tablespacenum, true, false);
/*
* Do not sync the resulting tar file yet, all files are synced once at
@@ -1296,184 +1151,10 @@ ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
{
WriteTarState *state = callback_data;
- if (!writerecoveryconf || !state->basetablespace)
- {
- /*
- * When not writing config file, or when not working on the base
- * tablespace, we never have to look for an existing configuration
- * file in the stream.
- */
- writeTarData(state, copybuf, r);
- }
- else
- {
- /*
- * Look for a config file in the existing tar stream. If it's there,
- * we must skip it so we can later overwrite it with our own version
- * of the file.
- *
- * To do this, we have to process the individual files inside the TAR
- * stream. The stream consists of a header and zero or more chunks,
- * each with a length equal to TAR_BLOCK_SIZE. The stream from the
- * server is broken up into smaller pieces, so we have to track the
- * size of the files to find the next header structure.
- */
- int rr = r;
- int pos = 0;
-
- while (rr > 0)
- {
- if (state->in_tarhdr)
- {
- /*
- * We're currently reading a header structure inside the TAR
- * stream, i.e. the file metadata.
- */
- if (state->tarhdrsz < TAR_BLOCK_SIZE)
- {
- /*
- * Copy the header structure into tarhdr in case the
- * header is not aligned properly or it's not returned in
- * whole by the last PQgetCopyData call.
- */
- int hdrleft;
- int bytes2copy;
-
- hdrleft = TAR_BLOCK_SIZE - state->tarhdrsz;
- bytes2copy = (rr > hdrleft ? hdrleft : rr);
-
- memcpy(&state->tarhdr[state->tarhdrsz], copybuf + pos,
- bytes2copy);
-
- rr -= bytes2copy;
- pos += bytes2copy;
- state->tarhdrsz += bytes2copy;
- }
- else
- {
- /*
- * We have the complete header structure in tarhdr, look
- * at the file metadata: we may want append recovery info
- * into postgresql.auto.conf and skip standby.signal file
- * if recovery parameters are integrated as GUCs, and
- * recovery.conf otherwise. In both cases we must
- * calculate tar padding.
- */
- if (state->is_recovery_guc_supported)
- {
- state->skip_file =
- (strcmp(&state->tarhdr[0], "standby.signal") == 0);
- state->is_postgresql_auto_conf =
- (strcmp(&state->tarhdr[0], "postgresql.auto.conf") == 0);
- }
- else
- state->skip_file =
- (strcmp(&state->tarhdr[0], "recovery.conf") == 0);
-
- state->filesz = read_tar_number(&state->tarhdr[124], 12);
- state->file_padding_len =
- tarPaddingBytesRequired(state->filesz);
-
- if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* replace tar header */
- char header[TAR_BLOCK_SIZE];
-
- tarCreateHeader(header, "postgresql.auto.conf", NULL,
- state->filesz + recoveryconfcontents->len,
- pg_file_create_mode, 04000, 02000,
- time(NULL));
-
- writeTarData(state, header, sizeof(header));
- }
- else
- {
- /* copy stream with padding */
- state->filesz += state->file_padding_len;
-
- if (!state->skip_file)
- {
- /*
- * If we're not skipping the file, write the tar
- * header unmodified.
- */
- writeTarData(state, state->tarhdr, TAR_BLOCK_SIZE);
- }
- }
-
- /* Next part is the file, not the header */
- state->in_tarhdr = false;
- }
- }
- else
- {
- /*
- * We're processing a file's contents.
- */
- if (state->filesz > 0)
- {
- /*
- * We still have data to read (and possibly write).
- */
- int bytes2write;
-
- bytes2write = (state->filesz > rr ? rr : state->filesz);
-
- if (!state->skip_file)
- writeTarData(state, copybuf + pos, bytes2write);
-
- rr -= bytes2write;
- pos += bytes2write;
- state->filesz -= bytes2write;
- }
- else if (state->is_recovery_guc_supported &&
- state->is_postgresql_auto_conf &&
- writerecoveryconf)
- {
- /* append recovery config to postgresql.auto.conf */
- int padding;
- int tailsize;
-
- tailsize = (TAR_BLOCK_SIZE - state->file_padding_len) + recoveryconfcontents->len;
- padding = tarPaddingBytesRequired(tailsize);
-
- writeTarData(state, recoveryconfcontents->data,
- recoveryconfcontents->len);
-
- if (padding)
- {
- char zerobuf[TAR_BLOCK_SIZE];
-
- MemSet(zerobuf, 0, sizeof(zerobuf));
- writeTarData(state, zerobuf, padding);
- }
+ bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
- /* skip original file padding */
- state->is_postgresql_auto_conf = false;
- state->skip_file = true;
- state->filesz += state->file_padding_len;
-
- state->found_postgresql_auto_conf = true;
- }
- else
- {
- /*
- * No more data in the current file, the next piece of
- * data (if any) will be a new file header structure.
- */
- state->in_tarhdr = true;
- state->skip_file = false;
- state->is_postgresql_auto_conf = false;
- state->tarhdrsz = 0;
- state->filesz = 0;
- }
- }
- }
- }
totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
+ progress_report(state->tablespacenum, false, false);
}
@@ -1498,242 +1179,6 @@ get_tablespace_mapping(const char *dir)
return dir;
}
-
-/*
- * Receive a tar format stream from the connection to the server, and unpack
- * the contents of it into a directory. Only files, directories and
- * symlinks are supported, no other kinds of special files.
- *
- * If the data is for the main data directory, it will be restored in the
- * specified directory. If it's for another tablespace, it will be restored
- * in the original or mapped directory.
- */
-static void
-ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
-{
- UnpackTarState state;
- bool basetablespace;
-
- memset(&state, 0, sizeof(state));
- state.tablespacenum = rownum;
-
- basetablespace = PQgetisnull(res, rownum, 0);
- if (basetablespace)
- strlcpy(state.current_path, basedir, sizeof(state.current_path));
- else
- strlcpy(state.current_path,
- get_tablespace_mapping(PQgetvalue(res, rownum, 1)),
- sizeof(state.current_path));
-
- ReceiveCopyData(conn, ReceiveTarAndUnpackCopyChunk, &state);
-
-
- if (state.file)
- fclose(state.file);
-
- progress_report(rownum, state.filename, true, false);
-
- if (state.file != NULL)
- {
- pg_log_error("COPY stream ended before last file was finished");
- exit(1);
- }
-
- if (basetablespace && writerecoveryconf)
- WriteRecoveryConfig(conn, basedir, recoveryconfcontents);
-
- /*
- * No data is synced here, everything is done for all tablespaces at the
- * end.
- */
-}
-
-static void
-ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
-{
- UnpackTarState *state = callback_data;
-
- if (state->file == NULL)
- {
-#ifndef WIN32
- int filemode;
-#endif
-
- /*
- * No current file, so this must be the header for a new file
- */
- if (r != TAR_BLOCK_SIZE)
- {
- pg_log_error("invalid tar block header size: %zu", r);
- exit(1);
- }
- totaldone += TAR_BLOCK_SIZE;
-
- state->current_len_left = read_tar_number(©buf[124], 12);
-
-#ifndef WIN32
- /* Set permissions on the file */
- filemode = read_tar_number(©buf[100], 8);
-#endif
-
- /*
- * All files are padded up to a multiple of TAR_BLOCK_SIZE
- */
- state->current_padding =
- tarPaddingBytesRequired(state->current_len_left);
-
- /*
- * First part of header is zero terminated filename
- */
- snprintf(state->filename, sizeof(state->filename),
- "%s/%s", state->current_path, copybuf);
- if (state->filename[strlen(state->filename) - 1] == '/')
- {
- /*
- * Ends in a slash means directory or symlink to directory
- */
- if (copybuf[156] == '5')
- {
- /*
- * Directory. Remove trailing slash first.
- */
- state->filename[strlen(state->filename) - 1] = '\0';
- if (mkdir(state->filename, pg_dir_create_mode) != 0)
- {
- /*
- * When streaming WAL, pg_wal (or pg_xlog for pre-9.6
- * clusters) will have been created by the wal receiver
- * process. Also, when the WAL directory location was
- * specified, pg_wal (or pg_xlog) has already been created
- * as a symbolic link before starting the actual backup.
- * So just ignore creation failures on related
- * directories.
- */
- if (!((pg_str_endswith(state->filename, "/pg_wal") ||
- pg_str_endswith(state->filename, "/pg_xlog") ||
- pg_str_endswith(state->filename, "/archive_status")) &&
- errno == EEXIST))
- {
- pg_log_error("could not create directory \"%s\": %m",
- state->filename);
- exit(1);
- }
- }
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on directory \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
- }
- else if (copybuf[156] == '2')
- {
- /*
- * Symbolic link
- *
- * It's most likely a link in pg_tblspc directory, to the
- * location of a tablespace. Apply any tablespace mapping
- * given on the command line (--tablespace-mapping). (We
- * blindly apply the mapping without checking that the link
- * really is inside pg_tblspc. We don't expect there to be
- * other symlinks in a data directory, but if there are, you
- * can call it an undocumented feature that you can map them
- * too.)
- */
- state->filename[strlen(state->filename) - 1] = '\0'; /* Remove trailing slash */
-
- state->mapped_tblspc_path =
- get_tablespace_mapping(©buf[157]);
- if (symlink(state->mapped_tblspc_path, state->filename) != 0)
- {
- pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
- state->filename, state->mapped_tblspc_path);
- exit(1);
- }
- }
- else
- {
- pg_log_error("unrecognized link indicator \"%c\"",
- copybuf[156]);
- exit(1);
- }
- return; /* directory or link handled */
- }
-
- /*
- * regular file
- */
- state->file = fopen(state->filename, "wb");
- if (!state->file)
- {
- pg_log_error("could not create file \"%s\": %m", state->filename);
- exit(1);
- }
-
-#ifndef WIN32
- if (chmod(state->filename, (mode_t) filemode))
- {
- pg_log_error("could not set permissions on file \"%s\": %m",
- state->filename);
- exit(1);
- }
-#endif
-
- if (state->current_len_left == 0)
- {
- /*
- * Done with this file, next one will be a new tar header
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* new file */
- else
- {
- /*
- * Continuing blocks in existing file
- */
- if (state->current_len_left == 0 && r == state->current_padding)
- {
- /*
- * Received the padding block for this file, ignore it and close
- * the file, then move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- totaldone += r;
- return;
- }
-
- errno = 0;
- if (fwrite(copybuf, r, 1, state->file) != 1)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to file \"%s\": %m", state->filename);
- exit(1);
- }
- totaldone += r;
- progress_report(state->tablespacenum, state->filename, false, false);
-
- state->current_len_left -= r;
- if (state->current_len_left == 0 && state->current_padding == 0)
- {
- /*
- * Received the last block, and there is no padding to be
- * expected. Close the file and move on to the next tar header.
- */
- fclose(state->file);
- state->file = NULL;
- return;
- }
- } /* continuing data in existing file */
-}
-
/*
* Receive the backup manifest file and write it out to a file.
*/
@@ -2032,16 +1477,32 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /*
- * Start receiving chunks
- */
+ /* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
{
- if (format == 't')
- ReceiveTarFile(conn, res, i);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named base.tar
+ * if it's the main data directory or <tablespaceoid>.tar if it's for
+ * another tablespace. CreateBackupStreamer() will arrange to add .gz
+ * to the archive name if pg_basebackup is performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
else
- ReceiveAndUnpackTarFile(conn, res, i);
- } /* Loop over all tablespaces */
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
+ }
/*
* Now receive backup manifest, if appropriate.
@@ -2057,7 +1518,10 @@ BaseBackup(void)
ReceiveBackupManifest(conn);
if (showprogress)
- progress_report(PQntuples(res), NULL, true, true);
+ {
+ progress_filename = NULL;
+ progress_report(PQntuples(res), true, true);
+ }
PQclear(res);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bd9f2b62ef..4846efbe10 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3773,3 +3773,13 @@ bbsink
bbsink_ops
bbsink_state
bbsink_throttle
+bbstreamer
+bbstreamer
+bbstreamer_archive_context
+bbstreamer_bzip_writer
+bbstreamer_member
+bbstreamer_ops
+bbstreamer_plain_writer
+bbstreamer_recovery_injector
+bbstreamer_tar_archiver
+bbstreamer_tar_parser
--
2.24.3 (Apple Git-128)
v8-0004-Support-base-backup-targets.patchapplication/octet-stream; name=v8-0004-Support-base-backup-targets.patchDownload
From 3367800e57de87cc1b11561cab769e704f34276c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 15:41:44 -0400
Subject: [PATCH v8 4/5] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 302 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 199 +++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 588 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..90c366e8d3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,35 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">target</replaceable></option></term>
+ <term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Specifies the target for the base backup. The default target is
+ <literal>client</literal>, which specifies that the backup should
+ be sent to the machine where <application>pg_basebackup</application>
+ is running. If the target is instead set to
+ <literal>server:/some/path</literal>, the backup will be stored on
+ the machine where the server is running in the
+ <literal>/some/path</literal> directory. Storing a backup on the
+ server requires superuser privileges. If the target is set to
+ <literal>blackhole</literal> causes the contents of the backup to be
+ discarded and not stored anywhere. This should only be used for
+ testing purposes, as you will not end up with an actual backup.
+ </para>
+
+ <para>
+ Due to limitations of the implementation, WAL cannot be included
+ in backups with non-default targets; threfore, the use of
+ <literal>-Xnone</literal> is required when a non-default target
+ is specified.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
<term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 06ba23fca7..e42d12a863 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -692,6 +695,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -837,25 +842,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -867,6 +882,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
@@ -898,14 +929,38 @@ SendBaseBackup(BaseBackupCmd *cmd)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt.target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt.target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt.target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt.target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 57183f4d46..2e9058b041 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -131,11 +134,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -208,8 +212,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -290,8 +298,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..ce1b7b4797
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,302 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index f163931f8a..f5202bae87 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -122,7 +122,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 4a5b7502f5..3b5e6b799a 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0a9eb8ca7e..f5d5d918a2 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -109,7 +109,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -126,6 +126,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -357,6 +358,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1216,15 +1219,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1296,24 +1306,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1684,7 +1702,35 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1779,8 +1825,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1791,7 +1842,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1874,7 +1926,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2008,8 +2060,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2031,7 +2086,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2065,6 +2120,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2115,7 +2171,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2156,6 +2212,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2288,18 +2347,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2309,6 +2400,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2325,6 +2426,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2358,8 +2462,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2380,6 +2494,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2387,6 +2502,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2396,6 +2514,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2438,11 +2559,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 36b9b76c5f..0e337a86f4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,9 +282,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index c22142365f..0b20981614 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8410829aef..710e83de1c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3774,6 +3774,7 @@ backup_target_type
bbsink
bbsink_copystream
bbsink_ops
+bbsink_server
bbsink_state
bbsink_throttle
bbstreamer
--
2.24.3 (Apple Git-128)
v8-0005-Server-side-gzip-compression.patchapplication/octet-stream; name=v8-0005-Server-side-gzip-compression.patchDownload
From 37c257bfb4507ec45b4f903e852f4ba25772eaf7 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 16:05:35 -0400
Subject: [PATCH v8 5/5] Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++-
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 303 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 44 +++-
src/include/replication/basebackup_sink.h | 1 +
7 files changed, 414 insertions(+), 5 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 90c366e8d3..a11800de65 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,31 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--server-compression=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Allows the tar files generated for each tablespace to be compressed
+ on the server, before they are sent to the client. The default value
+ is <literal>none</literal>, which performs no compression. If set
+ to <literal>gzip</literal>, compression is performed using gzip and
+ the suffix <filename>.gz</filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9</literal>
+ will provide the maximum compression that the <literal>gzip</literal>
+ algorithm can provide.
+ </para>
+ <para>
+ Since the write-ahead logs are fetched via a separate client
+ connection, they cannot be compressed using this option. See also
+ the <literal>--gzip</literal> and <literal>--compress</literal>
+ options.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-t <replaceable class="parameter">target</replaceable></option></term>
<term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
@@ -404,7 +429,9 @@ PostgreSQL documentation
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ automatically be added to all tar filenames. When this option is
+ used, compression is performed on the client side;
+ see also <literal>--server-compression</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e42d12a863..5f82993b78 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -697,11 +705,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -871,6 +881,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -965,6 +1000,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt.compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt.compression_level);
+
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..3d2fa93e55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,303 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index f5d5d918a2..176979f47d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -366,13 +367,15 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
+ printf(_(" --server-compression=none|gzip|gzip[1-9]\n"
+ " compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
- printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -z, --gzip compress tar output on client\n"));
+ printf(_(" -Z, --compress=0-9 compress tar output on client with given compression level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -987,7 +990,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -996,14 +1001,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1734,6 +1757,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2144,6 +2178,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2323,6 +2358,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 0e337a86f4..6bfea35c22 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
Thanks, Robert for the patches.
I tried to take a backup using gzip compression and got a core.
$ pg_basebackup -t server:/tmp/data_gzip -Xnone --server-compression=gzip
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The backtrace:
gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000558264bfc40a in bbsink_cleanup (sink=0x55826684b5f8) at
../../../src/include/replication/basebackup_sink.h:268
#2 0x0000558264bfc838 in bbsink_forward_cleanup (sink=0x55826684b710) at
basebackup_sink.c:124
#3 0x0000558264bf4cab in bbsink_cleanup (sink=0x55826684b710) at
../../../src/include/replication/basebackup_sink.h:268
#4 0x0000558264bf7738 in SendBaseBackup (cmd=0x55826683bd10) at
basebackup.c:1020
#5 0x0000558264c10915 in exec_replication_command (
cmd_string=0x5582667bc580 "BASE_BACKUP ( LABEL 'pg_basebackup base
backup', PROGRESS, MANIFEST 'yes', TABLESPACE_MAP, TARGET 'server',
TARGET_DETAIL '/tmp/data_g
zip', COMPRESSION 'gzip')") at walsender.c:1731
#6 0x0000558264c8a69b in PostgresMain (dbname=0x5582667e84d8 "",
username=0x5582667e84b8 "hadoop") at postgres.c:4493
#7 0x0000558264bb10a6 in BackendRun (port=0x5582667de160) at
postmaster.c:4560
#8 0x0000558264bb098b in BackendStartup (port=0x5582667de160) at
postmaster.c:4288
#9 0x0000558264bacb55 in ServerLoop () at postmaster.c:1801
#10 0x0000558264bac2ee in PostmasterMain (argc=3, argv=0x5582667b68c0) at
postmaster.c:1473
#11 0x0000558264aa0950 in main (argc=3, argv=0x5582667b68c0) at main.c:198
bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
bbsink_cleanup() callback is triggered, it tries to invoke a function that
is NULL. I think either bbsink_gzip_ops should set the cleanup callback
to bbsink_forward_cleanup or we should be calling the cleanup() callback
from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
we call from PG_CATCH, it will have a similar problem for gzip and other
sinks which may not need a custom cleanup() callback in case there is any
error before the backup could finish up normally.
I have attached a patch to fix this.
Thoughts?
Regards,
Jeevan Ladhe
On Tue, Oct 26, 2021 at 1:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Fri, Oct 15, 2021 at 8:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.OK. Will give it some thought.
Here's a new patch set. I've tried adding a "cleanup" callback to the
bbsink method and ensuring that it gets called even in case of an
error. The code for that is untested since I have no use for it with
the existing basebackup sink types, so let me know how it goes when
you try to use it for LZ4.I've also added documentation for the new pg_basebackup options in
this version, and I fixed up a couple of these patches to be
pgindent-clean when they previously were not.--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
fix_gzip_cleanup_callback_core.patchapplication/octet-stream; name=fix_gzip_cleanup_callback_core.patchDownload
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index 3d2fa93e55..432423bd55 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -50,7 +50,8 @@ const bbsink_ops bbsink_gzip_ops = {
.begin_manifest = bbsink_forward_begin_manifest,
.manifest_contents = bbsink_gzip_manifest_contents,
.end_manifest = bbsink_forward_end_manifest,
- .end_backup = bbsink_forward_end_backup
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
};
#endif
On Fri, Oct 29, 2021 at 8:59 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:>
bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
bbsink_cleanup() callback is triggered, it tries to invoke a function that
is NULL. I think either bbsink_gzip_ops should set the cleanup callback
to bbsink_forward_cleanup or we should be calling the cleanup() callback
from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
we call from PG_CATCH, it will have a similar problem for gzip and other
sinks which may not need a custom cleanup() callback in case there is any
error before the backup could finish up normally.I have attached a patch to fix this.
Yes, this is the right fix. Apologies for the oversight.
--
Robert Haas
EDB: http://www.enterprisedb.com
I have implemented the cleanup callback bbsink_lz4_cleanup() in the
attached patch.
Please have a look and let me know of any comments.
Regards,
Jeevan Ladhe
On Fri, Oct 29, 2021 at 6:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Fri, Oct 29, 2021 at 8:59 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:>bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
bbsink_cleanup() callback is triggered, it tries to invoke a functionthat
is NULL. I think either bbsink_gzip_ops should set the cleanup callback
to bbsink_forward_cleanup or we should be calling the cleanup() callback
from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
we call from PG_CATCH, it will have a similar problem for gzip and other
sinks which may not need a custom cleanup() callback in case there is any
error before the backup could finish up normally.I have attached a patch to fix this.
Yes, this is the right fix. Apologies for the oversight.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
lz4_compress_v7.patchapplication/octet-stream; name=lz4_compress_v7.patchDownload
commit 90fbe4d82d38de3083ddff9e50a23481f8d100f0
Author: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed Oct 27 18:04:58 2021 +0530
V7 LZ4 compression.
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5f82993b78..959e13400b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -899,6 +900,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1003,6 +1006,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..4a293a17b0
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,285 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed
+ * by LZ4F_compressBound(), ask the next sink to process the data so
+ * that we can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written = mysink->bytes_written + compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 6bfea35c22..2558ce5ca2 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
On Tue, Nov 2, 2021 at 7:53 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
I have implemented the cleanup callback bbsink_lz4_cleanup() in the attached patch.
Please have a look and let me know of any comments.
Looks pretty good. I think you should work on stuff like documentation
and tests, and I need to do some work on that stuff, too. Also, I
think you should try to figure out how to support different
compression levels. For gzip, I did that by making gzip1..gzip9
possible compression settings. But that might not have been the right
idea because something like lz43 to mean lz4 at level 3 would be
confusing. Also, for the lz4 command line utility, there's not only
"lz4 -3" which means LZ4 with level 3 compression, but also "lz4
--fast=3" which selects "ultra-fast compression level 3" rather than
regular old level 3. And apparently LZ4 levels go up to 12 rather than
just 9 like gzip. I'm thinking maybe we should go with something like
"gzip@9" rather than just "gzip9" to mean gzip with compression level
9, and then things like "lz4@3" or "lz4@fast3" would select either the
regular compression levels or the ultra-fast compression levels.
Meanwhile, I think it's probably OK for me to go ahead and commit
0001-0003 from my patches at this point, since it seems we have pretty
good evidence that the abstraction basically works, and there doesn't
seem to be any value in holding off and maybe having to do a bunch
more rebasing. We may also want to look into making -Fp work with
--server-compression, which would require pg_basebackup to know how to
decompress. I'm actually not sure if this is worthwhile; you'd need to
have a network connection slow enough that it's worth spending a lot
of CPU time compressing on the server and decompressing on the client
to make up for the cost of network transfer. But some people might
have that case. It might make it easier to test this, too, since we
probably can't rely on having an LZ4 binary installed. Another thing
that you probably need to investigate is also supporting client-side
LZ4 compression. I think that is probably a really desirable addition
to your patch set, since people might find it odd if that were
exclusively a server-side option. Hopefully it's not that much work.
One minor nitpick in terms of the code:
+ mysink->bytes_written = mysink->bytes_written + headerSize;
I would use += here.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Nov 2, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
Looks pretty good. I think you should work on stuff like documentation
and tests, and I need to do some work on that stuff, too. Also, I
think you should try to figure out how to support different
compression levels.
On second thought, maybe we don't need to do this. There's a thread on
"Teach pg_receivewal to use lz4 compression" which concluded that
supporting different compression levels was unnecessary.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Nov 2, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
Meanwhile, I think it's probably OK for me to go ahead and commit
0001-0003 from my patches at this point, since it seems we have pretty
good evidence that the abstraction basically works, and there doesn't
seem to be any value in holding off and maybe having to do a bunch
more rebasing.
I went ahead and committed 0001 and 0002, but got nervous about
proceeding with 0003. For those who may not have been following along
closely, what was 0003 and is now 0001 introduces a new COPY
subprotocol for taking backups. That probably needs to be documented
and as of now the patch does not do that, but the bigger question is
what to do about backward compatibility. I wrote the patch in such a
way that, post-patch, the server can do backups either the way that we
do them now, or the new way that it introduces, but I'm wondering if I
should rip that out and just support the new way only. If you run a
newer pg_basebackup against an older server, it will work, and still
does with the patch. If, however, you run an older pg_basebackup
against a newer server, it complains. For example running a pg13
pg_basebackup against a pg14 cluster produces this:
pg_basebackup: error: incompatible server version 14.0
pg_basebackup: removing data directory "pgstandby"
Now for all I know there is out-of-core software out there that speaks
the replication protocol and can take base backups using it and would
like it to continue working as it does today, and that's easy for me
to do, because that's the way the patch works. But on the other hand
since the patch adapts the in-core tools to use the new method when
talking to a new server, we wouldn't have test coverage for the old
method any more, which might possibly make it annoying to maintain.
But then again that is a problem we could leave for the future, and
rip it out then rather than now. I'm not sure which way to jump.
Anyone else have thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v9-0002-Support-base-backup-targets.patchapplication/octet-stream; name=v9-0002-Support-base-backup-targets.patchDownload
From c6115df297caf26a294911b30926f28dfb361e50 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 15:41:44 -0400
Subject: [PATCH v9 2/3] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 302 ++++++++++++++++++
src/backend/replication/basebackup_throttle.c | 2 +-
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 199 +++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
10 files changed, 587 insertions(+), 59 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..90c366e8d3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,35 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">target</replaceable></option></term>
+ <term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Specifies the target for the base backup. The default target is
+ <literal>client</literal>, which specifies that the backup should
+ be sent to the machine where <application>pg_basebackup</application>
+ is running. If the target is instead set to
+ <literal>server:/some/path</literal>, the backup will be stored on
+ the machine where the server is running in the
+ <literal>/some/path</literal> directory. Storing a backup on the
+ server requires superuser privileges. If the target is set to
+ <literal>blackhole</literal> causes the contents of the backup to be
+ discarded and not stored anywhere. This should only be used for
+ testing purposes, as you will not end up with an actual backup.
+ </para>
+
+ <para>
+ Due to limitations of the implementation, WAL cannot be included
+ in backups with non-default targets; threfore, the use of
+ <literal>-Xnone</literal> is required when a non-default target
+ is specified.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
<term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 096455ad02..ac1e0d8733 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -691,6 +694,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -836,25 +841,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -866,6 +881,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
@@ -897,14 +928,38 @@ SendBaseBackup(BaseBackupCmd *cmd)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt.target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt.target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt.target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt.target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 57183f4d46..2e9058b041 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -131,11 +134,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -208,8 +212,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -290,8 +298,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..ce1b7b4797
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,302 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/replication/basebackup_throttle.c b/src/backend/replication/basebackup_throttle.c
index f163931f8a..f5202bae87 100644
--- a/src/backend/replication/basebackup_throttle.c
+++ b/src/backend/replication/basebackup_throttle.c
@@ -122,7 +122,7 @@ bbsink_throttle_manifest_contents(bbsink *sink, size_t len)
{
throttle((bbsink_throttle *) sink, len);
- bbsink_forward_manifest_contents(sink->bbs_next, len);
+ bbsink_forward_manifest_contents(sink, len);
}
/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 4a5b7502f5..3b5e6b799a 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index ffeb6a3117..e8f76d2eb6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -109,7 +109,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -126,6 +126,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -357,6 +358,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1219,15 +1222,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ }
break;
}
@@ -1299,24 +1309,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1687,7 +1705,35 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1782,8 +1828,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1794,7 +1845,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1877,7 +1929,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2011,8 +2063,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2034,7 +2089,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2068,6 +2123,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2118,7 +2174,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2159,6 +2215,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2291,18 +2350,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2312,6 +2403,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal != NO_WAL)
+ {
+ pg_log_error("WAL cannot be included when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2328,6 +2429,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2361,8 +2465,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2383,6 +2497,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2390,6 +2505,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2399,6 +2517,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2441,11 +2562,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 36b9b76c5f..0e337a86f4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,9 +282,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index c22142365f..0b20981614 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
--
2.24.3 (Apple Git-128)
v9-0001-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v9-0001-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload
From 6b15ff7e035587277c440ae7dd0218db339c6006 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 25 Oct 2021 15:41:43 -0400
Subject: [PATCH v9 1/3] Modify pg_basebackup to use a new COPY subprotocol for
base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to support the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
src/backend/replication/basebackup.c | 61 ++-
src/backend/replication/basebackup_copy.c | 277 +++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 443 +++++++++++++++++++---
src/include/replication/basebackup_sink.h | 1 +
4 files changed, 728 insertions(+), 54 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 38c82c4619..096455ad02 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -81,6 +88,7 @@ static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char *linktarget, struct stat *statbuf,
bool sizeonly);
static void _tarWritePadding(bbsink *sink, int len);
+static void _tarEndArchive(bbsink *sink, backup_target_type target);
static void convert_link_to_directory(const char *pathbuf, struct stat *statbuf);
static void perform_base_backup(basebackup_options *opt, bbsink *sink);
static void parse_basebackup_options(List *options, basebackup_options *opt);
@@ -374,7 +382,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(&state);
@@ -611,6 +622,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ _tarEndArchive(sink, opt->target);
bbsink_end_archive(sink);
}
@@ -678,8 +690,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -820,6 +834,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -865,8 +895,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- /* Create a basic basebackup sink. */
- sink = bbsink_copytblspc_new();
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
@@ -1695,6 +1732,26 @@ _tarWritePadding(bbsink *sink, int len)
}
}
+/*
+ * Tar archives are supposed to end with two blocks of zeroes, so add those,
+ * unless we're using the old copy-tablespace protocol. In that system, the
+ * server must not properly terminate the client archive, and the client is
+ * instead responsible for adding those two blocks of zeroes.
+ */
+static void
+_tarEndArchive(bbsink *sink, backup_target_type target)
+{
+ if (target != BACKUP_TARGET_COMPAT)
+ {
+ /* See comments in _tarWriteHeader for why this must be true. */
+ Assert(sink->bbs_buffer_length >= TAR_BLOCK_SIZE);
+
+ MemSet(sink->bbs_buffer, 0, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, TAR_BLOCK_SIZE);
+ }
+}
+
/*
* If the entry in statbuf is a link, then adjust statbuf to make it look like a
* directory, so that it will be written that way.
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 30bab4546e..57183f4d46 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,52 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+static void bbsink_copystream_cleanup(bbsink *sink);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -38,6 +103,18 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup,
+ .cleanup = bbsink_copystream_cleanup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -50,6 +127,202 @@ const bbsink_ops bbsink_copytblspc_ops = {
.cleanup = bbsink_copytblspc_cleanup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins with
+ * a the type byte we're going to need, and then arrange things so that
+ * the data we're given will be written just after that type byte. That
+ * will allow us to ship the data with a single call to pq_putmessage and
+ * without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Cleanup.
+ */
+static void
+bbsink_copystream_cleanup(bbsink *sink)
+{
+ /* Nothing to do. */
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 169afa5645..ffeb6a3117 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -167,6 +177,13 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -981,10 +998,11 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
- * (2) we need to inject backup_manifest or recovery configuration into it.
+ * (2) we need to inject backup_manifest or recovery configuration into
+ * it.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
- (spclocation == NULL && writerecoveryconf));
+ (spclocation == NULL && writerecoveryconf));
if (format == 'p')
{
@@ -1011,8 +1029,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* In tar format, we just write the archive without extracting it.
* Normally, we write it to the archive name provided by the caller,
- * but when the base directory is "-" that means we need to write
- * to standard output.
+ * but when the base directory is "-" that means we need to write to
+ * standard output.
*/
if (strcmp(basedir, "-") == 0)
{
@@ -1052,16 +1070,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're supposed to inject the backup manifest into the results,
- * it should be done here, so that the file content can be injected
- * directly, without worrying about the details of the tar format.
+ * If we're supposed to inject the backup manifest into the results, it
+ * should be done here, so that the file content can be injected directly,
+ * without worrying about the details of the tar format.
*/
if (inject_manifest)
manifest_inject_streamer = streamer;
/*
- * If this is the main tablespace and we're supposed to write
- * recovery information, arrange to do that.
+ * If this is the main tablespace and we're supposed to write recovery
+ * information, arrange to do that.
*/
if (spclocation == NULL && writerecoveryconf)
{
@@ -1072,8 +1090,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
}
/*
- * If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * If we're doing anything that involves understanding the contents of the
+ * archive, we'll need to parse it.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
@@ -1083,6 +1101,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1336,28 +1665,32 @@ BaseBackup(void)
}
if (maxrate > 0)
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
- maxrate);
+ maxrate);
if (format == 't')
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if (!verify_checksums)
{
if (use_new_option_syntax)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "VERIFY_CHECKSUMS", 0);
+ "VERIFY_CHECKSUMS", 0);
else
AppendPlainCommandOption(&buf, use_new_option_syntax,
- "NOVERIFY_CHECKSUMS");
+ "NOVERIFY_CHECKSUMS");
}
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1480,46 +1813,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index e6c073c567..36b9b76c5f 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,6 +282,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v9-0003-Server-side-gzip-compression.patchapplication/octet-stream; name=v9-0003-Server-side-gzip-compression.patchDownload
From 919815b66abefe9b182f9e497914d0b0f3638f89 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 5 Nov 2021 10:05:02 -0400
Subject: [PATCH v9 3/3] Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++-
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 304 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 44 +++-
src/include/replication/basebackup_sink.h | 1 +
7 files changed, 415 insertions(+), 5 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 90c366e8d3..a11800de65 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,31 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--server-compression=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Allows the tar files generated for each tablespace to be compressed
+ on the server, before they are sent to the client. The default value
+ is <literal>none</literal>, which performs no compression. If set
+ to <literal>gzip</literal>, compression is performed using gzip and
+ the suffix <filename>.gz</filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9</literal>
+ will provide the maximum compression that the <literal>gzip</literal>
+ algorithm can provide.
+ </para>
+ <para>
+ Since the write-ahead logs are fetched via a separate client
+ connection, they cannot be compressed using this option. See also
+ the <literal>--gzip</literal> and <literal>--compress</literal>
+ options.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-t <replaceable class="parameter">target</replaceable></option></term>
<term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
@@ -404,7 +429,9 @@ PostgreSQL documentation
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ automatically be added to all tar filenames. When this option is
+ used, compression is performed on the client side;
+ see also <literal>--server-compression</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ac1e0d8733..a7269779d0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -696,11 +704,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -870,6 +880,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -964,6 +999,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt.compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt.compression_level);
+
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..432423bd55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,304 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index e8f76d2eb6..910143578c 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -366,13 +367,15 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
+ printf(_(" --server-compression=none|gzip|gzip[1-9]\n"
+ " compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
- printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -z, --gzip compress tar output on client\n"));
+ printf(_(" -Z, --compress=0-9 compress tar output on client with given compression level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -990,7 +993,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -999,14 +1004,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into
- * it.
+ * it. However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1737,6 +1760,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2147,6 +2181,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2326,6 +2361,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 0e337a86f4..6bfea35c22 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
On Fri, Nov 5, 2021 at 11:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
I went ahead and committed 0001 and 0002, but got nervous about
proceeding with 0003.
It turns out that these commits are causing failures on prairiedog.
Per email from Tom off-list, that's apparently because prairiedog has
a fussy version of tar that doesn't like it when you omit the trailing
NUL blocks that are supposed to be part of a tar file. So how did this
get broken?
It turns out that in the current state of the world, the server sends
an almost-tarfile to the client. What I mean by an almost-tarfile is
that it sends something that looks like a valid tarfile except that
the two blocks of trailing NUL bytes are omitted. Prior to these
patches, that was a very strategic omission, because the pg_basebackup
code wants to edit the tar files, and it wasn't smart enough to parse
them, so it just received all the data from the server, then added any
members that it wanted to add (e.g. recovery.signal) and then added
the terminator itself. I would classify this as an ugly hack, but it
worked. With these changes, the client is now capable of really
parsing a tarfile, so it would have no problem injecting new files
into the archive whether or not the server terminates it properly. It
also has no problem adding the two blocks of terminating NUL bytes if
the server omits them, but not otherwise. All in all, it's
significantly smarter code.
However, I also set things up so that the client doesn't bother
parsing the tar file from the server if it's not doing anything that
requires editing the tar file on the fly. That saves some overhead,
and it's also important for the rest of the patch set, which wants to
make it so that the server could send us something besides a tarfile,
like maybe a .tar.gz. We can't just have a convention of adding 1024
NUL bytes to any file the server sends us unless what the server sends
us is always and precisely an unterminated tarfile. Unfortunately,
that means that in the case where the tar parsing logic isn't used,
the tar file ends up with the proper terminator. Because most 'tar'
implementations are happy to ignore that defect, the tests pass on my
machine, but not on prairiedog. I think I realized this problem at
some point during the development process of this patch, but then I
forgot about it again and ended up committing something that has a
problem of which, at some earlier point in time, I had been entirely
aware. Oops.
It's tempting to try to fix this problem by changing the server so
that it properly terminates the tar files it sends to the client.
Honestly, I don't know how we ever thought it was OK to design a
protocol for base backups that involved the server sending something
that is almost but not quite a valid tarfile. However, that's not
quite good enough, because pg_basebackup is supposed to be backward
compatible, so we'd still have the same problem if a new version of
pg_basebackup were used with an old server. So what I'm inclined to do
is fix both the server and pg_basebackup. On the server side, properly
terminate the tarfile. On the client side, if we're talking to a
pre-v15 server and don't need to parse the tarfile, blindly add 1024
NUL bytes at the end.
I think I can get patches for this done today. Please let me know ASAP
if you have objections to this line of attack.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
It turns out that these commits are causing failures on prairiedog.
Per email from Tom off-list, that's apparently because prairiedog has
a fussy version of tar that doesn't like it when you omit the trailing
NUL blocks that are supposed to be part of a tar file.
FTR, prairiedog is green. It's Noah's AIX menagerie that's complaining.
It's actually a little bit disturbing that we're only seeing a failure
on that one platform, because that means that nothing else is anchoring
us to the strict POSIX specification for tarfile format. We knew that
GNU tar is forgiving about missing trailing zero blocks, but apparently
so is BSD tar.
One part of me wants to add some explicit test for the trailing blocks.
Another says, well, the *de facto* tar standard seems not to require
the trailing blocks, never mind the letter of POSIX --- so when AIX
dies, will anyone care anymore? Maybe not.
regards, tom lane
On Mon, Nov 8, 2021 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
It turns out that these commits are causing failures on prairiedog.
Per email from Tom off-list, that's apparently because prairiedog has
a fussy version of tar that doesn't like it when you omit the trailing
NUL blocks that are supposed to be part of a tar file.FTR, prairiedog is green. It's Noah's AIX menagerie that's complaining.
Woops.
It's actually a little bit disturbing that we're only seeing a failure
on that one platform, because that means that nothing else is anchoring
us to the strict POSIX specification for tarfile format. We knew that
GNU tar is forgiving about missing trailing zero blocks, but apparently
so is BSD tar.
Yeah.
One part of me wants to add some explicit test for the trailing blocks.
Another says, well, the *de facto* tar standard seems not to require
the trailing blocks, never mind the letter of POSIX --- so when AIX
dies, will anyone care anymore? Maybe not.
FWIW, I think both of those are pretty defensible positions. Honestly,
I'm not sure how likely the bug is to recur once we fix it here,
either. The only reason this is a problem is because of the kludge of
having the server generate the entire output file except for the last
1kB. If we eliminate that behavior I don't know that this particular
problem is especially likely to come back. But adding a test isn't
stupid either, just a bit tricky to write. When I was testing locally
this morning I found that there were considerably more than 1024 zero
bytes at the end of the file because the last file it backs up is
pg_control which ends with lots of zero bytes. So it's not sufficient
to just write a test that checks for non-zero bytes in the last 1kB of
the file. What I think you'd need to do is figure out the number of
files in the archive and the sizes of each one, and based on that work
out how big the tar archive should be: 512 bytes per file or directory
or symlink plus enough extra 512 byte chunks to cover the contents of
each file plus an extra 1024 bytes at the end. That doesn't seem
particularly simple to code. We could run 'tar tvf' and parse the
output to get the number of files and their lengths, but that seems
likely to cause more portability headaches than the underlying issue.
Since pg_basebackup now has the logic to do all of this parsing
internally, we could make it complain if it receives from a v15+
server an archive trailer that is not 1024 bytes of zeroes, but that
wouldn't help with this exact problem, because the issue in this case
is when pg_basebackup decides it doesn't need to parse in the first
place. We could add a pg_basebackup option
--force-parsing-and-check-if-the-server-seems-broken, but that seems
like overkill to me. So overall I'm inclined to just do nothing about
this unless someone has a better idea how to write a reasonable test.
Anyway, here's my proposal for fixing the issue immediately before us.
0001 adds logic to pad out the unterminated tar archives, and 0002
makes the server terminate its tar archives while preserving the logic
added by 0001 for cases where we're talking to an older server. I
assume that it's best to get something committed quickly here so will
do that in ~4 hours if there are no major objections, or sooner if I
hear some enthusiastic endorsement.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
0001-Minimal-fix-for-unterminated-tar-archive-problem.patchapplication/octet-stream; name=0001-Minimal-fix-for-unterminated-tar-archive-problem.patchDownload
From 5fd91e9ae33876a06aec12b5e4b7358bd7247bca Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 8 Nov 2021 10:51:58 -0500
Subject: [PATCH 1/2] Minimal fix for unterminated tar archive problem.
Commit 23a1c6578c87fca0e361c4f5f9a07df5ae1f9858 improved
pg_basebackup's ability to parse tar archives, but also arranged
to parse them only when we need to make some modification to the
contents of the archive. That's a problem, because the server
doesn't actually terminate tar archives. When the new parsing
logic was engaged, pg_basebackup would properly terminate the
tar file, but when it was skipped, pg_basebackup would just write
whatever it got from the server, meaning that the terminator
was missing.
Most versions of tar are willing to overlook the missing terminator, but
the AIX buildfarm animals were not. Fix by inventing a new kind of
bbstreamer that just blindly adds a terminator, and using it whenever we
don't parse the tar archive.
---
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_tar.c | 72 ++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 6 ++-
3 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index b24dc848c1..2fd50b92d9 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
extern bbstreamer *bbstreamer_recovery_injector_new(bbstreamer *next,
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
index 5a9f587dca..5fded0f4e6 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -59,6 +59,19 @@ const bbstreamer_ops bbstreamer_tar_archiver_ops = {
.free = bbstreamer_tar_archiver_free
};
+static void bbstreamer_tar_terminator_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_tar_terminator_finalize(bbstreamer *streamer);
+static void bbstreamer_tar_terminator_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_tar_terminator_ops = {
+ .content = bbstreamer_tar_terminator_content,
+ .finalize = bbstreamer_tar_terminator_finalize,
+ .free = bbstreamer_tar_terminator_free
+};
+
/*
* Create a bbstreamer that can parse a stream of content as tar data.
*
@@ -442,3 +455,62 @@ bbstreamer_tar_archiver_free(bbstreamer *streamer)
bbstreamer_free(streamer->bbs_next);
pfree(streamer);
}
+
+/*
+ * Create a bbstreamer that blindly adds two blocks of NUL bytes to the
+ * end of an incomplete tarfile that the server might send us.
+ */
+bbstreamer *
+bbstreamer_tar_terminator_new(bbstreamer *next)
+{
+ bbstreamer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer));
+ *((const bbstreamer_ops **) &streamer->bbs_ops) =
+ &bbstreamer_tar_terminator_ops;
+ streamer->bbs_next = next;
+
+ return streamer;
+}
+
+/*
+ * Pass all the content through without change.
+ */
+static void
+bbstreamer_tar_terminator_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ /* Expect unparsed input. */
+ Assert(member == NULL);
+ Assert(context == BBSTREAMER_UNKNOWN);
+
+ /* Just forward it. */
+ bbstreamer_content(streamer->bbs_next, member, data, len, context);
+}
+
+/*
+ * At the end, blindly add the two blocks of NUL bytes which the server fails
+ * to supply.
+ */
+static void
+bbstreamer_tar_terminator_finalize(bbstreamer *streamer)
+{
+ char buffer[2 * TAR_BLOCK_SIZE];
+
+ memset(buffer, 0, 2 * TAR_BLOCK_SIZE);
+ bbstreamer_content(streamer->bbs_next, NULL, buffer,
+ 2 * TAR_BLOCK_SIZE, BBSTREAMER_UNKNOWN);
+ bbstreamer_finalize(streamer->bbs_next);
+}
+
+/*
+ * Free memory associated with a tar terminator.
+ */
+static void
+bbstreamer_tar_terminator_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer);
+}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 169afa5645..30efc03b83 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1073,10 +1073,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* If we're doing anything that involves understanding the contents of
- * the archive, we'll need to parse it.
+ * the archive, we'll need to parse it. If not, we can skip parsing it,
+ * but the tar files the server sends are not properly terminated, so
+ * we'll need to add the terminator here.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
+ else
+ streamer = bbstreamer_tar_terminator_new(streamer);
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
--
2.24.3 (Apple Git-128)
0002-Have-the-server-properly-terminate-tar-archives.patchapplication/octet-stream; name=0002-Have-the-server-properly-terminate-tar-archives.patchDownload
From 848fb86b0af45867593da36a142b00e0ea5bf64b Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 8 Nov 2021 11:20:25 -0500
Subject: [PATCH 2/2] Have the server properly terminate tar archives.
Earlier versions of PostgreSQL featured a version of pg_basebackup
that wanted to edit tar archives but was too dumb to parse them
properly. The server made things easier for the client by failing
to add the two blocks of zero bytes that ought to end a tar file,
leaving it up to the client to do that.
But since commit 23a1c6578c87fca0e361c4f5f9a07df5ae1f9858, we
don't need this hack any more, because pg_basebackup is now smarter
and can parse tar files even if they are properly terminated! So
change the server to always properly terminate the tar files. Older
versions of pg_basebackup can't talk to new servers anyway, so
there's no compatibility break.
On the pg_basebackup side, we see still need to add the terminating
zero bytes if we're talking to an older server, but not when the
server is v15+. Hopefully at some point we'll be able to remove
some of this compatibility cruft, but it seems best to hang on to
it for now.
In passing, add a file header comment to bbstreamer_tar.c, to make
it clearer what's going on here.
---
src/backend/replication/basebackup.c | 16 ++++++++++++++++
src/bin/pg_basebackup/bbstreamer_tar.c | 10 ++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 25 +++++++++++++++++++------
3 files changed, 45 insertions(+), 6 deletions(-)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 38c82c4619..92430439f5 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -374,7 +374,16 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
Assert(lnext(state.tablespaces, lc) == NULL);
}
else
+ {
+ /* Properly terminate the tarfile. */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= 2 * BLCKSZ,
+ "BLCKSZ too small for 2 tar blocks");
+ memset(sink->bbs_buffer, 0, 2 * TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, 2 * TAR_BLOCK_SIZE);
+
+ /* OK, that's the end of the archive. */
bbsink_end_archive(sink);
+ }
}
basebackup_progress_wait_wal_archive(&state);
@@ -611,6 +620,13 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
sendFileWithContent(sink, pathbuf, "", &manifest);
}
+ /* Properly terminate the tar file. */
+ StaticAssertStmt(TAR_BLOCK_SIZE <= 2 * BLCKSZ,
+ "BLCKSZ too small for 2 tar blocks");
+ memset(sink->bbs_buffer, 0, 2 * TAR_BLOCK_SIZE);
+ bbsink_archive_contents(sink, 2 * TAR_BLOCK_SIZE);
+
+ /* OK, that's the end of the archive. */
bbsink_end_archive(sink);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_tar.c b/src/bin/pg_basebackup/bbstreamer_tar.c
index 5fded0f4e6..e6bd3ef52e 100644
--- a/src/bin/pg_basebackup/bbstreamer_tar.c
+++ b/src/bin/pg_basebackup/bbstreamer_tar.c
@@ -2,6 +2,16 @@
*
* bbstreamer_tar.c
*
+ * This module implements three types of tar processing. A tar parser
+ * expects unlabelled chunks of data (e.g. BBSTREAMER_UNKNOWN) and splits
+ * it into labelled chunks (any other value of bbstreamer_archive_context).
+ * A tar archiver does the reverse: it takes a bunch of labelled chunks
+ * and produces a tarfile, optionally replacing member headers and trailers
+ * so that upstream bbstreamer objects can perform surgery on the tarfile
+ * contents without knowing the details of the tar format. A tar terminator
+ * just adds two blocks of NUL bytes to the end of the file, since older
+ * server versions produce files with this terminator omitted.
+ *
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
*
* IDENTIFICATION
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 30efc03b83..1739ac6382 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -85,6 +85,12 @@ typedef void (*WriteDataCallback) (size_t nbytes, char *buf,
*/
#define MINIMUM_VERSION_FOR_MANIFESTS 130000
+/*
+ * Before v15, tar files received from the server will be improperly
+ * terminated.
+ */
+#define MINIMUM_VERSION_FOR_TERMINATED_TARFILE 150000
+
/*
* Different ways to include WAL
*/
@@ -166,7 +172,8 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
- bool is_recovery_guc_supported);
+ bool is_recovery_guc_supported,
+ bool expect_unterminated_tarfile);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -965,7 +972,8 @@ ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
static bbstreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
- bool is_recovery_guc_supported)
+ bool is_recovery_guc_supported,
+ bool expect_unterminated_tarfile)
{
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
@@ -1074,12 +1082,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/*
* If we're doing anything that involves understanding the contents of
* the archive, we'll need to parse it. If not, we can skip parsing it,
- * but the tar files the server sends are not properly terminated, so
- * we'll need to add the terminator here.
+ * but old versions of the server send improperly terminated tarfiles,
+ * so if we're talking to such a server we'll need to add the terminator
+ * here.
*/
if (must_parse_archive)
streamer = bbstreamer_tar_parser_new(streamer);
- else
+ else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
/* Return the results. */
@@ -1099,14 +1108,18 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
WriteTarState state;
bbstreamer *manifest_inject_streamer;
bool is_recovery_guc_supported;
+ bool expect_unterminated_tarfile;
/* Pass all COPY data through to the backup streamer. */
memset(&state, 0, sizeof(state));
is_recovery_guc_supported =
PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
+ expect_unterminated_tarfile =
+ PQserverVersion(conn) < MINIMUM_VERSION_FOR_TERMINATED_TARFILE;
state.streamer = CreateBackupStreamer(archive_name, spclocation,
&manifest_inject_streamer,
- is_recovery_guc_supported);
+ is_recovery_guc_supported,
+ expect_unterminated_tarfile);
state.tablespacenum = tablespacenum;
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
progress_filename = NULL;
--
2.24.3 (Apple Git-128)
On Mon, Nov 8, 2021 at 11:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
Anyway, here's my proposal for fixing the issue immediately before us.
0001 adds logic to pad out the unterminated tar archives, and 0002
makes the server terminate its tar archives while preserving the logic
added by 0001 for cases where we're talking to an older server. I
assume that it's best to get something committed quickly here so will
do that in ~4 hours if there are no major objections, or sooner if I
hear some enthusiastic endorsement.
I have now committed 0001 and will wait to see what the buildfarm
thinks about that before doing anything more.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Nov 8, 2021 at 4:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Nov 8, 2021 at 11:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
Anyway, here's my proposal for fixing the issue immediately before us.
0001 adds logic to pad out the unterminated tar archives, and 0002
makes the server terminate its tar archives while preserving the logic
added by 0001 for cases where we're talking to an older server. I
assume that it's best to get something committed quickly here so will
do that in ~4 hours if there are no major objections, or sooner if I
hear some enthusiastic endorsement.I have now committed 0001 and will wait to see what the buildfarm
thinks about that before doing anything more.
It seemed OK, so I have now committed 0002 as well.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Nov 05, 2021 at 11:50:01AM -0400, Robert Haas wrote:
On Tue, Nov 2, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote:Meanwhile, I think it's probably OK for me to go ahead and commit
0001-0003 from my patches at this point, since it seems we have pretty
good evidence that the abstraction basically works, and there doesn't
seem to be any value in holding off and maybe having to do a bunch
more rebasing.I went ahead and committed 0001 and 0002, but got nervous about
proceeding with 0003.
Hi,
I'm observing a strange issue which I can only relate to bef47ff85d
where bbsink abstraction was introduced. The problem is about failing
assertion when doing:
DETAIL: Failed process was running: BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, WAIT 0, MAX_RATE 102400, MANIFEST 'yes')
Walsender tries to send a backup manifest, but crashes on the trottling sink:
#2 0x0000560857b551af in ExceptionalCondition (conditionName=0x560857d15d27 "sink->bbs_next != NULL", errorType=0x560857d15c23 "FailedAssertion", fileName=0x560857d15d15 "basebackup_sink.c", lineNumber=91) at assert.c:69
#3 0x0000560857918a94 in bbsink_forward_manifest_contents (sink=0x5608593f73f8, len=32768) at basebackup_sink.c:91
#4 0x0000560857918d68 in bbsink_throttle_manifest_contents (sink=0x5608593f7450, len=32768) at basebackup_throttle.c:125
#5 0x00005608579186d0 in bbsink_manifest_contents (sink=0x5608593f7450, len=32768) at ../../../src/include/replication/basebackup_sink.h:240
#6 0x0000560857918b1b in bbsink_forward_manifest_contents (sink=0x5608593f74e8, len=32768) at basebackup_sink.c:94
#7 0x0000560857911edc in bbsink_manifest_contents (sink=0x5608593f74e8, len=32768) at ../../../src/include/replication/basebackup_sink.h:240
#8 0x00005608579129f6 in SendBackupManifest (manifest=0x7ffdaea9d120, sink=0x5608593f74e8) at backup_manifest.c:373
Looking at the similar bbsink_throttle_archive_contents it's not clear
why comments for both functions (archive and manifest throttling) say
"pass archive contents to next sink", but only bbsink_throttle_manifest_contents
does pass bbs_next into the bbsink_forward_manifest_contents. Is it
supposed to be like that? Passing the same sink object instead the next
one into bbsink_forward_manifest_contents seems to solve the problem in
this case.
On Mon, Nov 15, 2021 at 11:25 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
Walsender tries to send a backup manifest, but crashes on the trottling sink:
#2 0x0000560857b551af in ExceptionalCondition (conditionName=0x560857d15d27 "sink->bbs_next != NULL", errorType=0x560857d15c23 "FailedAssertion", fileName=0x560857d15d15 "basebackup_sink.c", lineNumber=91) at assert.c:69
#3 0x0000560857918a94 in bbsink_forward_manifest_contents (sink=0x5608593f73f8, len=32768) at basebackup_sink.c:91
#4 0x0000560857918d68 in bbsink_throttle_manifest_contents (sink=0x5608593f7450, len=32768) at basebackup_throttle.c:125
#5 0x00005608579186d0 in bbsink_manifest_contents (sink=0x5608593f7450, len=32768) at ../../../src/include/replication/basebackup_sink.h:240
#6 0x0000560857918b1b in bbsink_forward_manifest_contents (sink=0x5608593f74e8, len=32768) at basebackup_sink.c:94
#7 0x0000560857911edc in bbsink_manifest_contents (sink=0x5608593f74e8, len=32768) at ../../../src/include/replication/basebackup_sink.h:240
#8 0x00005608579129f6 in SendBackupManifest (manifest=0x7ffdaea9d120, sink=0x5608593f74e8) at backup_manifest.c:373Looking at the similar bbsink_throttle_archive_contents it's not clear
why comments for both functions (archive and manifest throttling) say
"pass archive contents to next sink", but only bbsink_throttle_manifest_contents
does pass bbs_next into the bbsink_forward_manifest_contents. Is it
supposed to be like that? Passing the same sink object instead the next
one into bbsink_forward_manifest_contents seems to solve the problem in
this case.
Yeah, that's what it should be doing. I'll commit a fix, thanks for
the report and diagnosis.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Nov 15, 2021 at 2:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, that's what it should be doing. I'll commit a fix, thanks for
the report and diagnosis.
Here's a new patch set.
0001 - When I committed the patch to add the missing 2 blocks of zero
bytes to the tar archives generated by the server, I failed to adjust
the documentation. So 0001 does that. This is the only new patch in
the series. I was not sure whether to just remove the statement from
the documentation saying that those blocks aren't included, or whether
to mention that we used to include them and no longer do. I went for
the latter; opinions welcome.
0002 - This adds a new COPY subprotocol for taking base backups. I've
improved it over the previous version by adding documentation. I'm
still seeking comments on the points I raised in
/messages/by-id/CA+TgmobrOXbDh+hCzzVkD3weV3R-QRy3SPa=FRb_Rv9wF5iPJw@mail.gmail.com
but what I'm leaning toward doing is committing the patch as is and
then submitting - or maybe several patches - later to rip some this
and a few other old things out. That way the debate - or lack thereof
- about what to do here doesn't have to block the main patch set, and
also, it feels safer to make removing the existing stuff a separate
effort rather than doing it now.
0003 - This adds "server" and "blackhole" as backup targets. In this
version, I've improved the documentation. Also, the previous version
only let you use a backup target with -Xnone, and I realized that was
stupid. -Xfetch is OK too. -Xstream still doesn't work, since that's
implemented via client-side logic. I think this still needs some work
to be committable, like adding tests, but I don't expect to make any
major changes.
0004 - Server-side gzip compression. Similar level of maturity to 0003.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v10-0001-Document-that-tar-archives-are-now-properly-term.patchapplication/octet-stream; name=v10-0001-Document-that-tar-archives-are-now-properly-term.patchDownload
From 0e1215bc322176ea2abe4b1a800f0c4d51fb92da Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 16 Nov 2021 11:02:06 -0500
Subject: [PATCH v10 1/4] Document that tar archives are now properly
terminated.
Commit 5a1007a5088cd6ddf892f7422ea8dbaef362372f changed the server
behavior, but I didn't notice that the existing behavior was
documented, and therefore did not update the documentation.
This commit does that.
I chose to mention that the behavior has changed rather than just
removing the reference to a deviation from a standard. It seemed
like that might be helpful to tool authors.
---
doc/src/sgml/protocol.sgml | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e59216e7f2..34a7034282 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2809,8 +2809,10 @@ The commands accepted in replication mode are:
than <literal>pg_default</literal> and <literal>pg_global</literal>. The data in
the CopyOutResponse results will be a tar format (following the
<quote>ustar interchange format</quote> specified in the POSIX 1003.1-2008
- standard) dump of the tablespace contents, except that the two trailing
- blocks of zeroes specified in the standard are omitted.
+ standard) dump of the tablespace contents. Prior to
+ <literal>PostgreSQL</literal> 15, the server omitted the two trailing
+ blocks of zeroes specified in the standard, but this is no longer the
+ case.
After the tar data is complete, and if a backup manifest was requested,
another CopyOutResponse result is sent, containing the manifest data for the
current base backup. In any case, a final ordinary result set will be
--
2.24.3 (Apple Git-128)
v10-0002-Modify-pg_basebackup-to-use-a-new-COPY-subprotoc.patchapplication/octet-stream; name=v10-0002-Modify-pg_basebackup-to-use-a-new-COPY-subprotoc.patchDownload
From fa3c4a712def2bba961b69cc8bff94ebfaa09c56 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 16 Nov 2021 13:17:48 -0500
Subject: [PATCH v10 2/4] Modify pg_basebackup to use a new COPY subprotocol
for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
When the new protocol is used, the server generates properly terminated
tar archives, in contrast to the old one which intentionally leaves out
the two blocks of zero bytes that are supposed to occur at the end of
each tar file. Any verison of pg_basebackup new enough to support the
new protocol is also smart enough not to be confused by these padding
blocks, so we need not propagate this kluge.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
---
doc/src/sgml/protocol.sgml | 130 ++++++-
src/backend/replication/basebackup.c | 36 +-
src/backend/replication/basebackup_copy.c | 277 ++++++++++++++-
src/bin/pg_basebackup/pg_basebackup.c | 410 ++++++++++++++++++++--
src/include/replication/basebackup_sink.h | 1 +
5 files changed, 806 insertions(+), 48 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 34a7034282..7e59edb1cc 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2630,6 +2630,22 @@ The commands accepted in replication mode are:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>TARGET</literal> <replaceable>'target'</replaceable></term>
+ <listitem>
+ <para>
+ Tells the server where to send the backup. If not specified,
+ the legacy base backup protocol will be used. Otherwise, the new
+ protocol will be used, as described below.
+ </para>
+
+ <para>
+ At present, the only supported value for this parameter is
+ <literal>client</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>PROGRESS [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
@@ -2805,19 +2821,113 @@ The commands accepted in replication mode are:
<para>
After the second regular result set, one or more CopyOutResponse results
- will be sent, one for the main data directory and one for each additional tablespace other
- than <literal>pg_default</literal> and <literal>pg_global</literal>. The data in
- the CopyOutResponse results will be a tar format (following the
- <quote>ustar interchange format</quote> specified in the POSIX 1003.1-2008
- standard) dump of the tablespace contents. Prior to
+ will be sent. If the <literal>TARGET</literal> option is not specified,
+ the legacy base backup protocol will be used. In this mode,
+ there will be one CopyOutResponse for the main directory, one for each
+ additional tablespace other than <literal>pg_default</literal> and
+ <literal>pg_global</literal>, and one for the backup manifested if
+ requested. The main data directory and any additional tablespaces will
+ be sent in tar format (following the <quote>ustar interchange
+ format</quote> specified in the POSIX 1003.1-2008 standard), and
+ the manifest will sent as a plain file. Prior to
<literal>PostgreSQL</literal> 15, the server omitted the two trailing
blocks of zeroes specified in the standard, but this is no longer the
case.
- After the tar data is complete, and if a backup manifest was requested,
- another CopyOutResponse result is sent, containing the manifest data for the
- current base backup. In any case, a final ordinary result set will be
- sent, containing the WAL end position of the backup, in the same format as
- the start position.
+ </para>
+
+ <para>
+ New applications should specify the <literal>TARGET</literal> option.
+ When that option is used, a single CopyOutResponse will be sent, and
+ the payload of each CopyData message will contain a message in one of
+ the following formats:
+ </para>
+
+ <para>
+ <variablelist>
+
+ <varlistentry>
+ <term>new archive (B)</term>
+ <listitem><para><variablelist>
+ <varlistentry>
+ <term>Byte1('n')</term>
+ <listitem><para>
+ Identifes the messaage as indicating the start of a new archive.
+ </para></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>String</term>
+ <listitem><para>
+ The file name for this archive.
+ </para></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>String</term>
+ <listitem><para>
+ For the main data directory, an empty string. For other
+ tablespaces, the full path to the directory from which this
+ archive was created.
+ </para></listitem>
+ </varlistentry>
+ </variablelist></para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>manifest (B)</term>
+ <listitem><para><variablelist>
+ <varlistentry>
+ <term>Byte1('m')</term>
+ <listitem><para>
+ Identifes the message as indicating the start of the backup
+ manifest.
+ </para></listitem>
+ </varlistentry>
+ </variablelist></para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>archive or manifest data (B)</term>
+ <listitem><para><variablelist>
+ <varlistentry>
+ <term>Byte1('d')</term>
+ <listitem><para>
+ Identifes the message as containing archive or manifest data.
+ </para></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Byte<replaceable>n</replaceable></term>
+ <listitem><para>
+ Data bytes.
+ </para></listitem>
+ </varlistentry>
+ </variablelist></para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>progress report (B)</term>
+ <listitem><para><variablelist>
+ <varlistentry>
+ <term>Byte1('p')</term>
+ <listitem><para>
+ Identifes the message as a progress report.
+ </para></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Int64</term>
+ <listitem><para>
+ The number of bytes from the current tablespace for which
+ processing has been completed.
+ </para></listitem>
+ </varlistentry>
+ </variablelist></para></listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ <para>
+ After the CopyOutResponse, or all such responses, have been sent, a
+ final ordinary result set will be sent, containing the WAL end position
+ of the backup, in the same format as the start position.
</para>
<para>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ec0485705d..d0d5acbf26 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -53,6 +53,12 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
+typedef enum
+{
+ BACKUP_TARGET_COMPAT,
+ BACKUP_TARGET_CLIENT
+} backup_target_type;
+
typedef struct
{
const char *label;
@@ -62,6 +68,7 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
+ backup_target_type target;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -694,8 +701,10 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_noverify_checksums = false;
bool o_manifest = false;
bool o_manifest_checksums = false;
+ bool o_target = false;
MemSet(opt, 0, sizeof(*opt));
+ opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
@@ -836,6 +845,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
optval)));
o_manifest_checksums = true;
}
+ else if (strcmp(defel->defname, "target") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "client") == 0)
+ opt->target = BACKUP_TARGET_CLIENT;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized target: \"%s\"", optval)));
+ o_target = true;
+ }
else
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
@@ -881,8 +906,15 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- /* Create a basic basebackup sink. */
- sink = bbsink_copytblspc_new();
+ /*
+ * If the TARGET option was specified, then we can use the new copy-stream
+ * protocol. If not, we must fall back to the old and less capable
+ * copy-tablespace protocol.
+ */
+ if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new();
+ else
+ sink = bbsink_copytblspc_new();
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 30bab4546e..57183f4d46 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -1,8 +1,27 @@
/*-------------------------------------------------------------------------
*
* basebackup_copy.c
- * send basebackup archives using one COPY OUT operation per
- * tablespace, and an additional COPY OUT for the backup manifest
+ * send basebackup archives using COPY OUT
+ *
+ * We have two different ways of doing this.
+ *
+ * 'copytblspc' is an older method still supported for compatibility
+ * with releases prior to v15. In this method, a separate COPY OUT
+ * operation is used for each tablespace. The manifest, if it is sent,
+ * uses an additional COPY OUT operation.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that
+ * single COPY OUT. Each CopyData message begins with a type byte,
+ * allowing us to signal the start of a new archive, or the manifest,
+ * by some means other than ending the COPY stream. This also allows
+ * this protocol to be extended more easily, since we can include
+ * arbitrary information in the message stream as long as we're certain
+ * that the client will know what to do with it.
+ *
+ * Regardless of which method is used, we sent a result set with
+ * information about the tabelspaces to be included in the backup before
+ * starting COPY OUT. This result has the same format in every method.
*
* Portions Copyright (c) 2010-2021, PostgreSQL Global Development Group
*
@@ -18,6 +37,52 @@
#include "libpq/pqformat.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "utils/timestamp.h"
+
+typedef struct bbsink_copystream
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /*
+ * Protocol message buffer. We assemble CopyData protocol messages by
+ * setting the first character of this buffer to 'd' (archive or manifest
+ * data) and then making base.bbs_buffer point to the second character so
+ * that the rest of the data gets copied into the message just where we
+ * want it.
+ */
+ char *msgbuffer;
+
+ /*
+ * When did we last report progress to the client, and how much progress
+ * did we report?
+ */
+ TimestampTz last_progress_report_time;
+ uint64 bytes_done_at_last_time_check;
+} bbsink_copystream;
+
+/*
+ * We don't want to send progress messages to the client excessively
+ * frequently. Ideally, we'd like to send a message when the time since the
+ * last message reaches PROGRESS_REPORT_MILLISECOND_THRESHOLD, but checking
+ * the system time every time we send a tiny bit of data seems too expensive.
+ * So we only check it after the number of bytes sine the last check reaches
+ * PROGRESS_REPORT_BYTE_INTERVAL.
+ */
+#define PROGRESS_REPORT_BYTE_INTERVAL 65536
+#define PROGRESS_REPORT_MILLISECOND_THRESHOLD 1000
+
+static void bbsink_copystream_begin_backup(bbsink *sink);
+static void bbsink_copystream_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_copystream_archive_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_archive(bbsink *sink);
+static void bbsink_copystream_begin_manifest(bbsink *sink);
+static void bbsink_copystream_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_copystream_end_manifest(bbsink *sink);
+static void bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+static void bbsink_copystream_cleanup(bbsink *sink);
static void bbsink_copytblspc_begin_backup(bbsink *sink);
static void bbsink_copytblspc_begin_archive(bbsink *sink,
@@ -38,6 +103,18 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
static void SendTablespaceList(List *tablespaces);
static void send_int8_string(StringInfoData *buf, int64 intval);
+const bbsink_ops bbsink_copystream_ops = {
+ .begin_backup = bbsink_copystream_begin_backup,
+ .begin_archive = bbsink_copystream_begin_archive,
+ .archive_contents = bbsink_copystream_archive_contents,
+ .end_archive = bbsink_copystream_end_archive,
+ .begin_manifest = bbsink_copystream_begin_manifest,
+ .manifest_contents = bbsink_copystream_manifest_contents,
+ .end_manifest = bbsink_copystream_end_manifest,
+ .end_backup = bbsink_copystream_end_backup,
+ .cleanup = bbsink_copystream_cleanup
+};
+
const bbsink_ops bbsink_copytblspc_ops = {
.begin_backup = bbsink_copytblspc_begin_backup,
.begin_archive = bbsink_copytblspc_begin_archive,
@@ -50,6 +127,202 @@ const bbsink_ops bbsink_copytblspc_ops = {
.cleanup = bbsink_copytblspc_cleanup
};
+/*
+ * Create a new 'copystream' bbsink.
+ */
+bbsink *
+bbsink_copystream_new(void)
+{
+ bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+
+ /* Set up for periodic progress reporting. */
+ sink->last_progress_report_time = GetCurrentTimestamp();
+ sink->bytes_done_at_last_time_check = UINT64CONST(0);
+
+ return &sink->base;
+}
+
+/*
+ * Send start-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_begin_backup(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = sink->bbs_state;
+
+ /*
+ * Initialize buffer. We ultimately want to send the archive and manifest
+ * data by means of CopyData messages where the payload portion of each
+ * message begins with a type byte, so we set up a buffer that begins with
+ * a the type byte we're going to need, and then arrange things so that
+ * the data we're given will be written just after that type byte. That
+ * will allow us to ship the data with a single call to pq_putmessage and
+ * without needing any extra copying.
+ */
+ mysink->msgbuffer = palloc(mysink->base.bbs_buffer_length + 1);
+ mysink->base.bbs_buffer = mysink->msgbuffer + 1;
+ mysink->msgbuffer[0] = 'd'; /* archive or manifest data */
+
+ /* Tell client the backup start location. */
+ SendXlogRecPtrResult(state->startptr, state->starttli);
+
+ /* Send client a list of tablespaces. */
+ SendTablespaceList(state->tablespaces);
+
+ /* Send a CommandComplete message */
+ pq_puttextmessage('C', "SELECT");
+
+ /* Begin COPY stream. This will be used for all archives + manifest. */
+ SendCopyOutResponse();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of a new archive.
+ */
+static void
+bbsink_copystream_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_state *state = sink->bbs_state;
+ tablespaceinfo *ti;
+ StringInfoData buf;
+
+ ti = list_nth(state->tablespaces, state->tablespace_num);
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'n'); /* New archive */
+ pq_sendstring(&buf, archive_name);
+ pq_sendstring(&buf, ti->path == NULL ? "" : ti->path);
+ pq_endmessage(&buf);
+}
+
+/*
+ * Send a CopyData message containing a chunk of archive content.
+ */
+static void
+bbsink_copystream_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+ uint64 targetbytes;
+
+ /* Send the archive content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+
+ /* Consider whether to send a progress report to the client. */
+ targetbytes = mysink->bytes_done_at_last_time_check
+ + PROGRESS_REPORT_BYTE_INTERVAL;
+ if (targetbytes <= state->bytes_done)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long ms;
+
+ /*
+ * OK, we've sent a decent number of bytes, so check the system time
+ * to see whether we're due to send a progress report.
+ */
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ ms = TimestampDifferenceMilliseconds(mysink->last_progress_report_time,
+ now);
+
+ /*
+ * Send a progress report if enough time has passed. Also send one if
+ * the system clock was set backward, so that such occurrences don't
+ * have the effect of suppressing further progress messages.
+ */
+ if (ms < 0 || ms >= PROGRESS_REPORT_MILLISECOND_THRESHOLD)
+ {
+ mysink->last_progress_report_time = now;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+ }
+ }
+}
+
+/*
+ * We don't need to explicitly signal the end of the archive; the client
+ * will figure out that we've reached the end when we begin the next one,
+ * or begin the manifest, or end the COPY stream. However, this seems like
+ * a good time to force out a progress report. One reason for that is that
+ * if this is the last archive, and we don't force a progress report now,
+ * the client will never be told that we sent all the bytes.
+ */
+static void
+bbsink_copystream_end_archive(bbsink *sink)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+ bbsink_state *state = mysink->base.bbs_state;
+ StringInfoData buf;
+
+ mysink->bytes_done_at_last_time_check = state->bytes_done;
+ mysink->last_progress_report_time = GetCurrentTimestamp();
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'p'); /* Progress report */
+ pq_sendint64(&buf, state->bytes_done);
+ pq_endmessage(&buf);
+ pq_flush_if_writable();
+}
+
+/*
+ * Send a CopyData message announcing the beginning of the backup manifest.
+ */
+static void
+bbsink_copystream_begin_manifest(bbsink *sink)
+{
+ StringInfoData buf;
+
+ pq_beginmessage(&buf, 'd'); /* CopyData */
+ pq_sendbyte(&buf, 'm'); /* Manifest */
+ pq_endmessage(&buf);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_copystream *mysink = (bbsink_copystream *) sink;
+
+ /* Send the manifest content to the client (with leading type byte). */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+}
+
+/*
+ * We don't need an explicit terminator for the backup manifest.
+ */
+static void
+bbsink_copystream_end_manifest(bbsink *sink)
+{
+ /* Do nothing. */
+}
+
+/*
+ * Send end-of-backup wire protocol messages.
+ */
+static void
+bbsink_copystream_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ SendCopyDone();
+ SendXlogRecPtrResult(endptr, endtli);
+}
+
+/*
+ * Cleanup.
+ */
+static void
+bbsink_copystream_cleanup(bbsink *sink)
+{
+ /* Nothing to do. */
+}
+
/*
* Create a new 'copytblspc' bbsink.
*/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1739ac6382..47656fc060 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -54,6 +54,16 @@ typedef struct TablespaceList
TablespaceListCell *tail;
} TablespaceList;
+typedef struct ArchiveStreamState
+{
+ int tablespacenum;
+ bbstreamer *streamer;
+ bbstreamer *manifest_inject_streamer;
+ PQExpBuffer manifest_buffer;
+ char manifest_filename[MAXPGPATH];
+ FILE *manifest_file;
+} ArchiveStreamState;
+
typedef struct WriteTarState
{
int tablespacenum;
@@ -174,6 +184,13 @@ static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
bool expect_unterminated_tarfile);
+static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
+ void *callback_data);
+static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
+static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
+static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
+static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
+static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
bool tablespacenum);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
@@ -1096,6 +1113,317 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
return streamer;
}
+/*
+ * Receive all of the archives the server wants to send - and the backup
+ * manifest if present - as a single COPY stream.
+ */
+static void
+ReceiveArchiveStream(PGconn *conn)
+{
+ ArchiveStreamState state;
+
+ /* Set up initial state. */
+ memset(&state, 0, sizeof(state));
+ state.tablespacenum = -1;
+
+ /* All the real work happens in ReceiveArchiveStreamChunk. */
+ ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
+
+ /* If we wrote the backup manifest to a file, close the file. */
+ if (state.manifest_file !=NULL)
+ {
+ fclose(state.manifest_file);
+ state.manifest_file = NULL;
+ }
+
+ /*
+ * If we buffered the backup manifest in order to inject it into the
+ * output tarfile, do that now.
+ */
+ if (state.manifest_inject_streamer != NULL &&
+ state.manifest_buffer != NULL)
+ {
+ bbstreamer_inject_file(state.manifest_inject_streamer,
+ "backup_manifest",
+ state.manifest_buffer->data,
+ state.manifest_buffer->len);
+ destroyPQExpBuffer(state.manifest_buffer);
+ state.manifest_buffer = NULL;
+ }
+
+ /* If there's still an archive in progress, end processing. */
+ if (state.streamer != NULL)
+ {
+ bbstreamer_finalize(state.streamer);
+ bbstreamer_free(state.streamer);
+ state.streamer = NULL;
+ }
+}
+
+/*
+ * Receive one chunk of data sent by the server as part of a single COPY
+ * stream that includes all archives and the manifest.
+ */
+static void
+ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
+{
+ ArchiveStreamState *state = callback_data;
+ size_t cursor = 0;
+
+ /* Each CopyData message begins with a type byte. */
+ switch (GetCopyDataByte(r, copybuf, &cursor))
+ {
+ case 'n':
+ {
+ /* New archive. */
+ char *archive_name;
+ char *spclocation;
+
+ /*
+ * We force a progress report at the end of each tablespace. A
+ * new tablespace starts when the previous one ends, except in
+ * the case of the very first one.
+ */
+ if (++state->tablespacenum > 0)
+ progress_report(state->tablespacenum, true, false);
+
+ /* Sanity check. */
+ if (state->manifest_buffer != NULL ||
+ state->manifest_file !=NULL)
+ {
+ pg_log_error("archives should precede manifest");
+ exit(1);
+ }
+
+ /* Parse the rest of the CopyData message. */
+ archive_name = GetCopyDataString(r, copybuf, &cursor);
+ spclocation = GetCopyDataString(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * Basic sanity checks on the archive name: it shouldn't be
+ * empty, it shouldn't start with a dot, and it shouldn't
+ * contain a path separator.
+ */
+ if (archive_name[0] == '\0' || archive_name[0] == '.' ||
+ strchr(archive_name, '/') != NULL ||
+ strchr(archive_name, '\\') != NULL)
+ {
+ pg_log_error("invalid archive name: \"%s\"",
+ archive_name);
+ exit(1);
+ }
+
+ /*
+ * An empty spclocation is treated as NULL. We expect this
+ * case to occur for the data directory itself, but not for
+ * any archives that correspond to tablespaces.
+ */
+ if (spclocation[0] == '\0')
+ spclocation = NULL;
+
+ /* End processing of any prior archive. */
+ if (state->streamer != NULL)
+ {
+ bbstreamer_finalize(state->streamer);
+ bbstreamer_free(state->streamer);
+ state->streamer = NULL;
+ }
+
+ /*
+ * Create an appropriate backup streamer. We know that
+ * recovery GUCs are supported, because this protocol can only
+ * be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true, false);
+ break;
+ }
+
+ case 'd':
+ {
+ /* Archive or manifest data. */
+ if (state->manifest_buffer != NULL)
+ {
+ /* Manifest data, buffer in memory. */
+ appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
+ r - 1);
+ }
+ else if (state->manifest_file !=NULL)
+ {
+ /* Manifest data, write to disk. */
+ if (fwrite(copybuf + 1, r - 1, 1,
+ state->manifest_file) != 1)
+ {
+ /*
+ * If fwrite() didn't set errno, assume that the
+ * problem is that we're out of disk space.
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ else if (state->streamer != NULL)
+ {
+ /* Archive data. */
+ bbstreamer_content(state->streamer, NULL, copybuf + 1,
+ r - 1, BBSTREAMER_UNKNOWN);
+ }
+ else
+ {
+ pg_log_error("unexpected payload data");
+ exit(1);
+ }
+ break;
+ }
+
+ case 'p':
+ {
+ /*
+ * Progress report.
+ *
+ * The remainder of the message is expected to be an 8-byte
+ * count of bytes completed.
+ */
+ totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * The server shouldn't send progres report messages too
+ * often, so we force an update each time we receive one.
+ */
+ progress_report(state->tablespacenum, true, false);
+ break;
+ }
+
+ case 'm':
+ {
+ /*
+ * Manifest data will be sent next. This message is not
+ * expected to have any further payload data.
+ */
+ GetCopyDataEnd(r, copybuf, cursor);
+
+ /*
+ * If we're supposed inject the manifest into the archive, we
+ * prepare to buffer it in memory; otherwise, we prepare to
+ * write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
+ {
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
+ }
+ break;
+ }
+
+ default:
+ ReportCopyDataParseError(r, copybuf);
+ break;
+ }
+}
+
+/*
+ * Get a single byte from a CopyData message.
+ *
+ * Bail out if none remain.
+ */
+static char
+GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
+{
+ if (*cursor >= r)
+ ReportCopyDataParseError(r, copybuf);
+
+ return copybuf[(*cursor)++];
+}
+
+/*
+ * Get a NUL-terminated string from a CopyData message.
+ *
+ * Bail out if the terminating NUL cannot be found.
+ */
+static char *
+GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
+{
+ size_t startpos = *cursor;
+ size_t endpos = startpos;
+
+ while (1)
+ {
+ if (endpos >= r)
+ ReportCopyDataParseError(r, copybuf);
+ if (copybuf[endpos] == '\0')
+ break;
+ ++endpos;
+ }
+
+ *cursor = endpos + 1;
+ return ©buf[startpos];
+}
+
+/*
+ * Get an unsigned 64-bit integer from a CopyData message.
+ *
+ * Bail out if there are not at least 8 bytes remaining.
+ */
+static uint64
+GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
+{
+ uint64 result;
+
+ if (*cursor + sizeof(uint64) > r)
+ ReportCopyDataParseError(r, copybuf);
+ memcpy(&result, ©buf[*cursor], sizeof(uint64));
+ *cursor += sizeof(uint64);
+ return pg_ntoh64(result);
+}
+
+/*
+ * Bail out if we didn't parse the whole message.
+ */
+static void
+GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
+{
+ if (r != cursor)
+ ReportCopyDataParseError(r, copybuf);
+}
+
+/*
+ * Report failure to parse a CopyData message from the server. Then exit.
+ *
+ * As a debugging aid, we try to give some hint about what kind of message
+ * provoked the failure. Perhaps this is not detailed enough, but it's not
+ * clear that it's worth expending any more code on what shoud be a
+ * can't-happen case.
+ */
+static void
+ReportCopyDataParseError(size_t r, char *copybuf)
+{
+ if (r == 0)
+ pg_log_error("empty COPY message");
+ else
+ pg_log_error("malformed COPY message of type %d, length %zu",
+ copybuf[0], r);
+ exit(1);
+}
+
/*
* Receive raw tar data from the server, and stream it to the appropriate
* location. If we're writing a single tarfile to standard output, also
@@ -1375,6 +1703,10 @@ BaseBackup(void)
"MANIFEST_CHECKSUMS", manifest_checksums);
}
+ if (serverMajor >= 1500)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", "client");
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -1497,46 +1829,56 @@ BaseBackup(void)
StartLogStreamer(xlogstart, starttli, sysidentifier);
}
- /* Receive a tar file for each tablespace in turn */
- for (i = 0; i < PQntuples(res); i++)
+ if (serverMajor >= 1500)
{
- char archive_name[MAXPGPATH];
- char *spclocation;
-
- /*
- * If we write the data out to a tar file, it will be named base.tar
- * if it's the main data directory or <tablespaceoid>.tar if it's for
- * another tablespace. CreateBackupStreamer() will arrange to add .gz
- * to the archive name if pg_basebackup is performing compression.
- */
- if (PQgetisnull(res, i, 0))
- {
- strlcpy(archive_name, "base.tar", sizeof(archive_name));
- spclocation = NULL;
- }
- else
+ /* Receive a single tar stream with everything. */
+ ReceiveArchiveStream(conn);
+ }
+ else
+ {
+ /* Receive a tar file for each tablespace in turn */
+ for (i = 0; i < PQntuples(res); i++)
{
- snprintf(archive_name, sizeof(archive_name),
- "%s.tar", PQgetvalue(res, i, 0));
- spclocation = PQgetvalue(res, i, 1);
+ char archive_name[MAXPGPATH];
+ char *spclocation;
+
+ /*
+ * If we write the data out to a tar file, it will be named
+ * base.tar if it's the main data directory or <tablespaceoid>.tar
+ * if it's for another tablespace. CreateBackupStreamer() will
+ * arrange to add .gz to the archive name if pg_basebackup is
+ * performing compression.
+ */
+ if (PQgetisnull(res, i, 0))
+ {
+ strlcpy(archive_name, "base.tar", sizeof(archive_name));
+ spclocation = NULL;
+ }
+ else
+ {
+ snprintf(archive_name, sizeof(archive_name),
+ "%s.tar", PQgetvalue(res, i, 0));
+ spclocation = PQgetvalue(res, i, 1);
+ }
+
+ ReceiveTarFile(conn, archive_name, spclocation, i);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ /*
+ * Now receive backup manifest, if appropriate.
+ *
+ * If we're writing a tarfile to stdout, ReceiveTarFile will have
+ * already processed the backup manifest and included it in the output
+ * tarfile. Such a configuration doesn't allow for writing multiple
+ * files.
+ *
+ * If we're talking to an older server, it won't send a backup
+ * manifest, so don't try to receive one.
+ */
+ if (!writing_to_stdout && manifest)
+ ReceiveBackupManifest(conn);
}
- /*
- * Now receive backup manifest, if appropriate.
- *
- * If we're writing a tarfile to stdout, ReceiveTarFile will have already
- * processed the backup manifest and included it in the output tarfile.
- * Such a configuration doesn't allow for writing multiple files.
- *
- * If we're talking to an older server, it won't send a backup manifest,
- * so don't try to receive one.
- */
- if (!writing_to_stdout && manifest)
- ReceiveBackupManifest(conn);
-
if (showprogress)
{
progress_filename = NULL;
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index e6c073c567..36b9b76c5f 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,6 +282,7 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
+extern bbsink *bbsink_copystream_new(void);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v10-0003-Support-base-backup-targets.patchapplication/octet-stream; name=v10-0003-Support-base-backup-targets.patchDownload
From 6830759ba115e6996959d8621135b48c4d87c5b4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 16 Nov 2021 15:20:50 -0500
Subject: [PATCH v10 3/4] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/protocol.sgml | 23 +-
doc/src/sgml/ref/pg_basebackup.sgml | 30 ++
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 +++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 302 ++++++++++++++++++++
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 203 ++++++++++---
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
10 files changed, 610 insertions(+), 62 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 7e59edb1cc..cd6dca691e 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2640,8 +2640,27 @@ The commands accepted in replication mode are:
</para>
<para>
- At present, the only supported value for this parameter is
- <literal>client</literal>.
+ If the target is <literal>client</literal>, the backup data is
+ sent to the client. If it is <literal>server</literal>, the backup
+ data is written to the server at the pathname specified by the
+ <literal>TARGET_DETAIL</literal> option. If it is
+ <literal>blackhole</literal>, the backup data is not sent
+ anywhere; it is simply discarded.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>TARGET_DETAIL</literal> <replaceable>'detail'</replaceable></term>
+ <listitem>
+ <para>
+ Provides additional information about the backup target.
+ </para>
+
+ <para>
+ Currently, this option can only be used when the backup target is
+ <literal>server</literal>. It specifies the server directory
+ to which the backup should be written.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..165a9ea5cc 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,36 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">target</replaceable></option></term>
+ <term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Instructs the server where to place the base backup. The default target
+ is <literal>client</literal>, which specifies that the backup should
+ be sent to the machine where <application>pg_basebackup</application>
+ is running. If the target is instead set to
+ <literal>server:/some/path</literal>, the backup will be stored on
+ the machine where the server is running in the
+ <literal>/some/path</literal> directory. Storing a backup on the
+ server requires superuser privileges. If the target is set to
+ <literal>blackhole</literal> causes the contents of the backup to be
+ discarded and not stored anywhere. This should only be used for
+ testing purposes, as you will not end up with an actual backup.
+ </para>
+
+ <para>
+ Since WAL streaming is implemented by
+ <application>pg_basebackup</application> rather than by the server,
+ this option cannot be used together with <literal>-Xstream</literal>.
+ Since that is the default, when this option is specified, you must also
+ specify either <literal>-Xfetch</literal> or <literal>-Xnone</literal>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
<term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0d5acbf26..7f37630e6c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -702,6 +705,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -847,25 +852,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -877,6 +892,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
@@ -908,14 +939,38 @@ SendBaseBackup(BaseBackupCmd *cmd)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt.target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt.target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt.target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt.target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 57183f4d46..2e9058b041 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -131,11 +134,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -208,8 +212,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -290,8 +298,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..ce1b7b4797
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,302 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 4a5b7502f5..3b5e6b799a 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -510,6 +510,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 47656fc060..4c9498c368 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -115,7 +115,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -132,6 +132,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -364,6 +365,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1231,15 +1234,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true, false);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true, false);
+ }
break;
}
@@ -1311,24 +1321,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1697,13 +1715,41 @@ BaseBackup(void)
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1798,8 +1844,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1810,7 +1861,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1893,7 +1945,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2027,8 +2079,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2050,7 +2105,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2084,6 +2139,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2134,7 +2190,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2175,6 +2231,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2307,18 +2366,50 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
*/
- if (basedir == NULL)
+ if (backup_target != NULL && format != '\0')
{
- pg_log_error("no target directory specified");
+ pg_log_error("cannot specify both format and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ if (format == '\0')
+ format = 'p';
/*
- * Mutually exclusive arguments
+ * Either directory or backup target should be specified, but not both
+ */
+ if (basedir == NULL && backup_target == NULL)
+ {
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
@@ -2328,6 +2419,16 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal == STREAM_WAL)
+ {
+ pg_log_error("WAL cannot be streamed when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2344,6 +2445,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2377,8 +2481,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2399,6 +2513,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2406,6 +2521,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2415,6 +2533,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2457,11 +2578,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 36b9b76c5f..0e337a86f4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,9 +282,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index c22142365f..0b20981614 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -153,6 +153,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
--
2.24.3 (Apple Git-128)
v10-0004-Server-side-gzip-compression.patchapplication/octet-stream; name=v10-0004-Server-side-gzip-compression.patchDownload
From a6a0dbecd155cc1ded5ce59e84cf8827676e9b42 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 5 Nov 2021 10:05:02 -0400
Subject: [PATCH v10 4/4] Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++-
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 304 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 43 ++-
src/include/replication/basebackup_sink.h | 1 +
7 files changed, 415 insertions(+), 4 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 165a9ea5cc..9ce8b8d89d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,31 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--server-compression=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Allows the tar files generated for each tablespace to be compressed
+ on the server, before they are sent to the client. The default value
+ is <literal>none</literal>, which performs no compression. If set
+ to <literal>gzip</literal>, compression is performed using gzip and
+ the suffix <filename>.gz</filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9</literal>
+ will provide the maximum compression that the <literal>gzip</literal>
+ algorithm can provide.
+ </para>
+ <para>
+ Since the write-ahead logs are fetched via a separate client
+ connection, they cannot be compressed using this option. See also
+ the <literal>--gzip</literal> and <literal>--compress</literal>
+ options.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-t <replaceable class="parameter">target</replaceable></option></term>
<term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
@@ -405,7 +430,9 @@ PostgreSQL documentation
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ automatically be added to all tar filenames. When this option is
+ used, compression is performed on the client side;
+ see also <literal>--server-compression</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 7f37630e6c..ff26537679 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -707,11 +715,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -881,6 +891,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -975,6 +1010,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt.compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt.compression_level);
+
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..432423bd55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,304 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4c9498c368..b76e00818f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -139,6 +139,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -373,13 +374,15 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
+ printf(_(" --server-compression=none|gzip|gzip[1-9]\n"
+ " compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
- printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -z, --gzip compress tar output on client\n"));
+ printf(_(" -Z, --compress=0-9 compress tar output on client with given compression level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -998,7 +1001,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1007,13 +1012,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
+ * However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1753,6 +1777,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2163,6 +2198,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2342,6 +2378,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 0e337a86f4..6bfea35c22 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
Hi Robert,
Please find the lz4 compression patch here that basically has:
1. Documentation
2. pgindent run over it.
3. your comments addressed for using "+="
I have not included the compression level per your comment below:
---------
"On second thought, maybe we don't need to do this. There's a thread on
"Teach pg_receivewal to use lz4 compression" which concluded that
supporting different compression levels was unnecessary."
---------
Regards,
Jeevan Ladhe
On Wed, Nov 17, 2021 at 3:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Mon, Nov 15, 2021 at 2:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, that's what it should be doing. I'll commit a fix, thanks for
the report and diagnosis.Here's a new patch set.
0001 - When I committed the patch to add the missing 2 blocks of zero
bytes to the tar archives generated by the server, I failed to adjust
the documentation. So 0001 does that. This is the only new patch in
the series. I was not sure whether to just remove the statement from
the documentation saying that those blocks aren't included, or whether
to mention that we used to include them and no longer do. I went for
the latter; opinions welcome.0002 - This adds a new COPY subprotocol for taking base backups. I've
improved it over the previous version by adding documentation. I'm
still seeking comments on the points I raised in/messages/by-id/CA+TgmobrOXbDh+hCzzVkD3weV3R-QRy3SPa=FRb_Rv9wF5iPJw@mail.gmail.com
but what I'm leaning toward doing is committing the patch as is and
then submitting - or maybe several patches - later to rip some this
and a few other old things out. That way the debate - or lack thereof
- about what to do here doesn't have to block the main patch set, and
also, it feels safer to make removing the existing stuff a separate
effort rather than doing it now.0003 - This adds "server" and "blackhole" as backup targets. In this
version, I've improved the documentation. Also, the previous version
only let you use a backup target with -Xnone, and I realized that was
stupid. -Xfetch is OK too. -Xstream still doesn't work, since that's
implemented via client-side logic. I think this still needs some work
to be committable, like adding tests, but I don't expect to make any
major changes.0004 - Server-side gzip compression. Similar level of maturity to 0003.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v8-0001-LZ4-compression.patchapplication/octet-stream; name=v8-0001-LZ4-compression.patchDownload
From ac7f611bd62f81a408cf652f4a0af906a515b3cb Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed, 17 Nov 2021 19:43:20 +0530
Subject: [PATCH] LZ4 compression
---
doc/src/sgml/ref/pg_basebackup.sgml | 49 +++-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_lz4.c | 285 ++++++++++++++++++++++
src/include/replication/basebackup_sink.h | 1 +
5 files changed, 334 insertions(+), 9 deletions(-)
create mode 100644 src/backend/replication/basebackup_lz4.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9ce8b8d89d..44395a749b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -230,14 +230,7 @@ PostgreSQL documentation
<para>
Allows the tar files generated for each tablespace to be compressed
- on the server, before they are sent to the client. The default value
- is <literal>none</literal>, which performs no compression. If set
- to <literal>gzip</literal>, compression is performed using gzip and
- the suffix <filename>.gz</filename> will automatically be added to
- compressed files. A numeric digit between 1 and 9 can be added to
- specify the compression level; for instance, <literal>gzip9</literal>
- will provide the maximum compression that the <literal>gzip</literal>
- algorithm can provide.
+ on the server, before they are sent to the client.
</para>
<para>
Since the write-ahead logs are fetched via a separate client
@@ -245,7 +238,47 @@ PostgreSQL documentation
the <literal>--gzip</literal> and <literal>--compress</literal>
options.
</para>
+ <para>
+ The following <replaceable>target</replaceable> algorithms for
+ server-compression are supported:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>none</literal></term>
+ <listitem>
+ <para>
+ Perform no compression. This is the default value.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>gzip</literal></term>
+ <listitem>
+ <para>
+ Compression is performed using <literal>gzip</literal> and the
+ suffix <filename>.gz </filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9
+ </literal> will provide the maximum compression that the
+ <literal>gzip</literal> algorithm can provide.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>lz4</literal></term>
+ <listitem>
+ <para>
+ Compression is performed using <literal>lz4</literal> and the
+ suffix <filename>.lz4</filename> will automatically be added to
+ compressed files.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ff26537679..aab6744f73 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -909,6 +910,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1013,6 +1016,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..0f49def813
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,285 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written += headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed by
+ * LZ4F_compressBound(), ask the next sink to process the data so that we
+ * can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written += compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written += compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 6bfea35c22..2558ce5ca2 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
Please find the lz4 compression patch here that basically has:
Thanks, Could you please rebase your patch, it is failing at my end -
[edb@centos7tushar pg15_lz]$ git apply /tmp/v8-0001-LZ4-compression.patch
error: patch failed: doc/src/sgml/ref/pg_basebackup.sgml:230
error: doc/src/sgml/ref/pg_basebackup.sgml: patch does not apply
error: patch failed: src/backend/replication/Makefile:19
error: src/backend/replication/Makefile: patch does not apply
error: patch failed: src/backend/replication/basebackup.c:64
error: src/backend/replication/basebackup.c: patch does not apply
error: patch failed: src/include/replication/basebackup_sink.h:285
error: src/include/replication/basebackup_sink.h: patch does not apply
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Hi Tushar,
You need to apply Robert's v10 version patches 0002, 0003 and 0004, before
applying the lz4 patch(v8 version).
Please let me know if you still face any issues.
Regards,
Jeevan Ladhe
On Mon, Dec 27, 2021 at 7:01 PM tushar <tushar.ahuja@enterprisedb.com>
wrote:
Show quoted text
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
Please find the lz4 compression patch here that basically has:
Thanks, Could you please rebase your patch, it is failing at my end -
[edb@centos7tushar pg15_lz]$ git apply /tmp/v8-0001-LZ4-compression.patch
error: patch failed: doc/src/sgml/ref/pg_basebackup.sgml:230
error: doc/src/sgml/ref/pg_basebackup.sgml: patch does not apply
error: patch failed: src/backend/replication/Makefile:19
error: src/backend/replication/Makefile: patch does not apply
error: patch failed: src/backend/replication/basebackup.c:64
error: src/backend/replication/basebackup.c: patch does not apply
error: patch failed: src/include/replication/basebackup_sink.h:285
error: src/include/replication/basebackup_sink.h: patch does not apply--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On 12/28/21 1:11 PM, Jeevan Ladhe wrote:
You need to apply Robert's v10 version patches 0002, 0003 and 0004,
before applying the lz4 patch(v8 version).
Thanks, able to apply now.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
Please find the lz4 compression patch here that basically has:
One small issue, in the "pg_basebackup --help", we are not displaying
lz4 value under --server-compression option
[edb@tusharcentos7-v14 bin]$ ./pg_basebackup --help | grep
server-compression
--server-compression=none|gzip|gzip[1-9]
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
Please find the lz4 compression patch here that basically has:
Please refer to this scenario , where --server-compression is only
compressing
base backup into lz4 format but not pg_wal directory
[edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=lz4
-Xstream -D foo
[edb@centos7tushar bin]$ ls foo
backup_manifest base.tar.lz4 pg_wal.tar
this same is valid for gzip as well if server-compression is set to gzip
edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=gzip4
-Xstream -D foo1
[edb@centos7tushar bin]$ ls foo1
backup_manifest base.tar.gz pg_wal.tar
if this scenario is valid then both the folders format should be in lz4
format otherwise we should
get an error something like - not a valid option ?
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Mon, Jan 3, 2022 at 12:12 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
Please find the lz4 compression patch here that basically has:
Please refer to this scenario , where --server-compression is only
compressing
base backup into lz4 format but not pg_wal directory[edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=lz4
-Xstream -D foo[edb@centos7tushar bin]$ ls foo
backup_manifest base.tar.lz4 pg_wal.tarthis same is valid for gzip as well if server-compression is set to gzip
edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=gzip4
-Xstream -D foo1[edb@centos7tushar bin]$ ls foo1
backup_manifest base.tar.gz pg_wal.tarif this scenario is valid then both the folders format should be in lz4
format otherwise we should
get an error something like - not a valid option ?
Before sending an email like this, it would be a good idea to read the
documentation for the --server-compression option.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 1/4/22 8:07 PM, Robert Haas wrote:
Before sending an email like this, it would be a good idea to read the
documentation for the --server-compression option.
Sure, Thanks Robert.
One scenario where I feel error message is confusing and if it is not
supported at all then error message need to be a little bit more clear
if we use -z (or -Z ) with -t , we are getting this error
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/test0 -Xfetch -z
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.
but after removing -z option backup is in tar mode only
edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/test0 -Xfetch
[edb@centos7tushar bin]$ ls /tmp/test0
backup_manifest base.tar
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Wed, Jan 5, 2022 at 5:11 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
One scenario where I feel error message is confusing and if it is not
supported at all then error message need to be a little bit more clearif we use -z (or -Z ) with -t , we are getting this error
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/test0 -Xfetch -z
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.but after removing -z option backup is in tar mode only
edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/test0 -Xfetch
[edb@centos7tushar bin]$ ls /tmp/test0
backup_manifest base.tar
OK, fair enough, I can adjust the error message for that case.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
wrote:
Hi Tushar,
You need to apply Robert's v10 version patches 0002, 0003 and 0004, before
applying the lz4 patch(v8 version).
Please let me know if you still face any issues.
Thanks, Jeevan.
I tested —server-compression option using different other options of
pg_basebackup, also checked -t/—server-compression from pg_basebackup of
v15 will
throw an error if the server version is v14 or below. Things are looking
good to me.
Two open issues -
1)lz4 value is missing for --server-compression in pg_basebackup --help
2)Error messages need to improve if using -t server with -z/-Z
regards,
Hi,
Similar to LZ4 server-side compression, I have also tried to add a ZSTD
server-side compression in the attached patch. I have done some initial
testing and things seem to be working.
Example run:
pg_basebackup -t server:/tmp/data_zstd -Xnone --server-compression=zstd
The patch surely needs some grooming, but I am expecting some initial
review, specially in the area where we are trying to close the zstd stream
in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
compression by calling ZSTD_compressStream2() thereby sending a
ZSTD_e_end flag. But, this also needs some input string, which per
example[1]https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/tools/zbi/zbi.cc line # 686, I have taken as an empty ZSTD_inBuffer.
Thanks, Tushar for testing the LZ4 patch. I have added the LZ4 option in
the pg_basebackup help now.
Note: Before applying these patches please apply Robert's v10 version
of patches 0002, 0003, and 0004.
[1]: https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/tools/zbi/zbi.cc
https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/tools/zbi/zbi.cc
Regards,
Jeevan Ladhe
On Wed, Jan 5, 2022 at 10:24 PM tushar <tushar.ahuja@enterprisedb.com>
wrote:
Show quoted text
On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe <
jeevan.ladhe@enterprisedb.com> wrote:Hi Tushar,
You need to apply Robert's v10 version patches 0002, 0003 and 0004,
before applying the lz4 patch(v8 version).
Please let me know if you still face any issues.Thanks, Jeevan.
I tested —server-compression option using different other options of
pg_basebackup, also checked -t/—server-compression from pg_basebackup of
v15 will
throw an error if the server version is v14 or below. Things are looking
good to me.
Two open issues -
1)lz4 value is missing for --server-compression in pg_basebackup --help
2)Error messages need to improve if using -t server with -z/-Zregards,
Attachments:
v9-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchapplication/octet-stream; name=v9-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchDownload
From 80aa8cb9ecbeb3303562129ab13a772aa29dd1b4 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Tue, 18 Jan 2022 19:46:36 +0530
Subject: [PATCH 1/2] Add a LZ4 compression method for server side compression.
Adds LZ4 server side compression option --server-compression=lz4
Add documentation for LZ4.
Add pg_basebackup help for ZSTD option
Example:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4
---
doc/src/sgml/ref/pg_basebackup.sgml | 49 +++-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_lz4.c | 285 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
src/include/replication/basebackup_sink.h | 1 +
6 files changed, 335 insertions(+), 10 deletions(-)
create mode 100644 src/backend/replication/basebackup_lz4.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9ce8b8d89d..44395a749b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -230,14 +230,7 @@ PostgreSQL documentation
<para>
Allows the tar files generated for each tablespace to be compressed
- on the server, before they are sent to the client. The default value
- is <literal>none</literal>, which performs no compression. If set
- to <literal>gzip</literal>, compression is performed using gzip and
- the suffix <filename>.gz</filename> will automatically be added to
- compressed files. A numeric digit between 1 and 9 can be added to
- specify the compression level; for instance, <literal>gzip9</literal>
- will provide the maximum compression that the <literal>gzip</literal>
- algorithm can provide.
+ on the server, before they are sent to the client.
</para>
<para>
Since the write-ahead logs are fetched via a separate client
@@ -245,7 +238,47 @@ PostgreSQL documentation
the <literal>--gzip</literal> and <literal>--compress</literal>
options.
</para>
+ <para>
+ The following <replaceable>target</replaceable> algorithms for
+ server-compression are supported:
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>none</literal></term>
+ <listitem>
+ <para>
+ Perform no compression. This is the default value.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>gzip</literal></term>
+ <listitem>
+ <para>
+ Compression is performed using <literal>gzip</literal> and the
+ suffix <filename>.gz </filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9
+ </literal> will provide the maximum compression that the
+ <literal>gzip</literal> algorithm can provide.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>lz4</literal></term>
+ <listitem>
+ <para>
+ Compression is performed using <literal>lz4</literal> and the
+ suffix <filename>.lz4</filename> will automatically be added to
+ compressed files.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 4bed0f18b7..9dea1c9bcc 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -909,6 +910,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
opt->compression_level = optval[4] - '0';
}
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1013,6 +1016,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..0f49def813
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,285 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+#include <unistd.h>
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression. */
+bbsink *
+bbsink_lz4_new(bbsink *next)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written += headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed by
+ * LZ4F_compressBound(), ask the next sink to process the data so that we
+ * can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written += compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written += compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 00fa55b982..d8da1cb2e9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -374,7 +374,7 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
- printf(_(" --server-compression=none|gzip|gzip[1-9]\n"
+ printf(_(" --server-compression=none|gzip|gzip[1-9]|lz4\n"
" compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index d3276b2487..964752ef5d 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
v9-0002-Add-a-ZSTD-compression-method-for-server-side-compre.patchapplication/octet-stream; name=v9-0002-Add-a-ZSTD-compression-method-for-server-side-compre.patchDownload
From 5b06f5b1039b51f0847e7c310c04a61308b3c7b9 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Tue, 18 Jan 2022 19:48:33 +0530
Subject: [PATCH 2/2] Add a ZSTD compression method for server side
compression.
This patch introduces --server-compression=zstd option.
Add config option --with-zstd.
Add documentation for ZSTD option
Add pg_basebackup help for ZSTD option
Example: pg_basebackup -t server:/tmp/data_zstd -Xnone --server-compression=zstd
---
configure | 240 ++++++++++++++++++-
configure.ac | 32 +++
doc/src/sgml/ref/pg_basebackup.sgml | 11 +
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_zstd.c | 267 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
src/include/pg_config.h.in | 6 +
src/include/replication/basebackup_sink.h | 1 +
9 files changed, 559 insertions(+), 8 deletions(-)
create mode 100644 src/backend/replication/basebackup_zstd.c
diff --git a/configure b/configure
index 9c856cb1d5..a532e85e66 100755
--- a/configure
+++ b/configure
@@ -699,6 +699,9 @@ with_gnu_ld
LD
LDFLAGS_SL
LDFLAGS_EX
+ZSTD_LIBS
+ZSTD_CFLAGS
+with_zstd
LZ4_LIBS
LZ4_CFLAGS
with_lz4
@@ -800,6 +803,7 @@ infodir
docdir
oldincludedir
includedir
+runstatedir
localstatedir
sharedstatedir
sysconfdir
@@ -868,6 +872,7 @@ with_libxslt
with_system_tzdata
with_zlib
with_lz4
+with_zstd
with_gnu_ld
with_ssl
with_openssl
@@ -897,6 +902,8 @@ XML2_CFLAGS
XML2_LIBS
LZ4_CFLAGS
LZ4_LIBS
+ZSTD_CFLAGS
+ZSTD_LIBS
LDFLAGS_EX
LDFLAGS_SL
PERL
@@ -941,6 +948,7 @@ datadir='${datarootdir}'
sysconfdir='${prefix}/etc'
sharedstatedir='${prefix}/com'
localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
includedir='${prefix}/include'
oldincludedir='/usr/include'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1193,6 +1201,15 @@ do
| -silent | --silent | --silen | --sile | --sil)
silent=yes ;;
+ -runstatedir | --runstatedir | --runstatedi | --runstated \
+ | --runstate | --runstat | --runsta | --runst | --runs \
+ | --run | --ru | --r)
+ ac_prev=runstatedir ;;
+ -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+ | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+ | --run=* | --ru=* | --r=*)
+ runstatedir=$ac_optarg ;;
+
-sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
ac_prev=sbindir ;;
-sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1330,7 +1347,7 @@ fi
for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
- libdir localedir mandir
+ libdir localedir mandir runstatedir
do
eval ac_val=\$$ac_var
# Remove trailing slashes.
@@ -1483,6 +1500,7 @@ Fine tuning of the installation directories:
--sysconfdir=DIR read-only single-machine data [PREFIX/etc]
--sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
--localstatedir=DIR modifiable single-machine data [PREFIX/var]
+ --runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run]
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -1576,6 +1594,7 @@ Optional Packages:
use system time zone data in DIR
--without-zlib do not use Zlib
--with-lz4 build with LZ4 support
+ --with-zstd build with ZSTD support
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
--with-ssl=LIB use LIB for SSL/TLS support (openssl)
--with-openssl obsolete spelling of --with-ssl=openssl
@@ -1605,6 +1624,8 @@ Some influential environment variables:
XML2_LIBS linker flags for XML2, overriding pkg-config
LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config
LZ4_LIBS linker flags for LZ4, overriding pkg-config
+ ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config
+ ZSTD_LIBS linker flags for ZSTD, overriding pkg-config
LDFLAGS_EX extra linker flags for linking executables only
LDFLAGS_SL extra linker flags for linking shared libraries only
PERL Perl program
@@ -9033,6 +9054,146 @@ fi
done
fi
+#
+# ZSTD
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with ZSTD support" >&5
+$as_echo_n "checking whether to build with ZSTD support... " >&6; }
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+ withval=$with_zstd;
+ case $withval in
+ yes)
+
+$as_echo "#define USE_ZSTD 1" >>confdefs.h
+
+ ;;
+ no)
+ :
+ ;;
+ *)
+ as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+ ;;
+ esac
+
+else
+ with_zstd=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_zstd" >&5
+$as_echo "$with_zstd" >&6; }
+
+
+if test "$with_zstd" = yes; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
+$as_echo_n "checking for libzstd... " >&6; }
+
+if test -n "$ZSTD_CFLAGS"; then
+ pkg_cv_ZSTD_CFLAGS="$ZSTD_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+if test -n "$ZSTD_LIBS"; then
+ pkg_cv_ZSTD_LIBS="$ZSTD_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+ _pkg_short_errors_supported=yes
+else
+ _pkg_short_errors_supported=no
+fi
+ if test $_pkg_short_errors_supported = yes; then
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libzstd" 2>&1`
+ else
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libzstd" 2>&1`
+ fi
+ # Put the nasty error message in config.log where it belongs
+ echo "$ZSTD_PKG_ERRORS" >&5
+
+ as_fn_error $? "Package requirements (libzstd) were not met:
+
+$ZSTD_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+ ZSTD_CFLAGS=$pkg_cv_ZSTD_CFLAGS
+ ZSTD_LIBS=$pkg_cv_ZSTD_LIBS
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -13136,6 +13297,56 @@ fi
fi
+if test "$with_zstd" = yes ; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+/* Override any GCC internal prototype to avoid an error.
+ Use char because int might match the return type of a GCC
+ builtin and then its argument prototype would still apply. */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+ ;
+ return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+ ac_cv_lib_zstd_ZSTD_compress=yes
+else
+ ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+ LIBS="-lzstd $LIBS"
+
+else
+ as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -13856,6 +14067,23 @@ done
fi
+if test "$with_zstd" = yes; then
+ for ac_header in zstd.h
+do :
+ ac_fn_c_check_header_mongrel "$LINENO" "zstd.h" "ac_cv_header_zstd_h" "$ac_includes_default"
+if test "x$ac_cv_header_zstd_h" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_ZSTD_H 1
+_ACEOF
+
+else
+ as_fn_error $? "zstd.h header file is required for ZSTD" "$LINENO" 5
+fi
+
+done
+
+fi
+
if test "$with_gssapi" = yes ; then
for ac_header in gssapi/gssapi.h
do :
@@ -15259,7 +15487,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15305,7 +15533,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15329,7 +15557,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15374,7 +15602,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15398,7 +15626,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
diff --git a/configure.ac b/configure.ac
index 95287705f6..85e15ff9f8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1056,6 +1056,30 @@ if test "$with_lz4" = yes; then
done
fi
+#
+# ZSTD
+#
+AC_MSG_CHECKING([whether to build with ZSTD support])
+PGAC_ARG_BOOL(with, zstd, no, [build with ZSTD support],
+ [AC_DEFINE([USE_ZSTD], 1, [Define to 1 to build with ZSTD support. (--with-zstd)])])
+AC_MSG_RESULT([$with_zstd])
+AC_SUBST(with_zstd)
+
+if test "$with_zstd" = yes; then
+ PKG_CHECK_MODULES(ZSTD, libzstd)
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -1325,6 +1349,10 @@ if test "$with_lz4" = yes ; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
+if test "$with_zstd" = yes ; then
+ AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])])
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -1488,6 +1516,10 @@ if test "$with_lz4" = yes; then
AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])
fi
+if test "$with_zstd" = yes; then
+ AC_CHECK_HEADERS(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])])
+fi
+
if test "$with_gssapi" = yes ; then
AC_CHECK_HEADERS(gssapi/gssapi.h, [],
[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 44395a749b..5cadadf16c 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -276,6 +276,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>zstd</literal></term>
+ <listitem>
+ <para>
+ Compression is performed using <literal>zstd</literal> and the
+ suffix <filename>.zst</filename> will automatically be added to
+ compressed files.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..2e6de7007f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -20,6 +20,7 @@ OBJS = \
basebackup_copy.o \
basebackup_gzip.o \
basebackup_lz4.o \
+ basebackup_zstd.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 9dea1c9bcc..12992b0a4d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -65,7 +65,8 @@ typedef enum
{
BACKUP_COMPRESSION_NONE,
BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
} basebackup_compression_type;
typedef struct
@@ -912,6 +913,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(optval, "lz4") == 0)
opt->compression = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(optval, "zstd") == 0)
+ opt->compression = BACKUP_COMPRESSION_ZSTD;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1018,6 +1021,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
sink = bbsink_gzip_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink);
+ else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
+ sink = bbsink_zstd_new(sink);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
new file mode 100644
index 0000000000..a4bba94e7e
--- /dev/null
+++ b/src/backend/replication/basebackup_zstd.c
@@ -0,0 +1,267 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_zstd.c
+ * Basebackup sink implementing zstd compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbsink_zstd
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbsink_zstd;
+
+static void bbsink_zstd_begin_backup(bbsink *sink);
+static void bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_zstd_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_zstd_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_zstd_end_archive(bbsink *sink);
+static void bbsink_zstd_cleanup(bbsink *sink);
+static void bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+const bbsink_ops bbsink_zstd_ops = {
+ .begin_backup = bbsink_zstd_begin_backup,
+ .begin_archive = bbsink_zstd_begin_archive,
+ .archive_contents = bbsink_zstd_archive_contents,
+ .end_archive = bbsink_zstd_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_zstd_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_zstd_end_backup,
+ .cleanup = bbsink_zstd_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs zstd compression. */
+bbsink *
+bbsink_zstd_new(bbsink *next)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+#else
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+
+ sink = palloc0(sizeof(bbsink_zstd));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
+ sink->base.bbs_next = next;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_zstd_begin_backup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t output_buffer_bound;
+
+ mysink->cctx = ZSTD_createCCtx();
+ if (!mysink->cctx)
+ elog(ERROR, "could not create zstd compression context");
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Make sure that the next sink's bbs_buffer is big enough to accommodate
+ * the compressed input buffer.
+ */
+ output_buffer_bound = ZSTD_compressBound(mysink->base.bbs_buffer_length);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ char *zstd_archive_name;
+
+ /*
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they would stick
+ * around as we are resetting with option ZSTD_reset_session_only.
+ */
+ ZSTD_CCtx_reset(mysink->cctx, ZSTD_reset_session_only);
+
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ /* Add ".zst" to the archive name. */
+ zstd_archive_name = psprintf("%s.zst", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, zstd_archive_name);
+ pfree(zstd_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_zstd_end_archive() is invoked.
+ */
+static void
+bbsink_zstd_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ ZSTD_inBuffer inBuf = { mysink->base.bbs_buffer, len, 0 };
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx, &mysink->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * There might be some data inside zstd's internal buffers; we need to get that
+ * flushed out, also end the zstd frame and then get that forwarded to the
+ * successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_zstd_end_archive(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {};
+
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx,
+ &mysink->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
+ } while(yet_to_flush > 0);
+
+ /* Make sure to pass the any remaining bytes to the next sink. */
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Free the resources and context.
+ */
+static void
+bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+
+ bbsink_forward_end_backup(sink, endptr, endtli);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ */
+static void
+bbsink_zstd_cleanup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context if not already released. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d8da1cb2e9..b0c4a0f5b2 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -374,7 +374,7 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
- printf(_(" --server-compression=none|gzip|gzip[1-9]|lz4\n"
+ printf(_(" --server-compression=none|gzip|gzip[1-9]|lz4|zstd\n"
" compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 9d9bd6b9ef..61b2220eeb 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -325,6 +325,9 @@
/* Define to 1 if you have the `lz4' library (-llz4). */
#undef HAVE_LIBLZ4
+/* Define to 1 if you have the `zstd' library (-lzstd). */
+#undef HAVE_LIBZSTD
+
/* Define to 1 if you have the `m' library (-lm). */
#undef HAVE_LIBM
@@ -367,6 +370,9 @@
/* Define to 1 if you have the <lz4.h> header file. */
#undef HAVE_LZ4_H
+/* Define to 1 if you have the <zstd.h> header file. */
+#undef HAVE_ZSTD_H
+
/* Define to 1 if you have the <mbarrier.h> header file. */
#undef HAVE_MBARRIER_H
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 964752ef5d..8c18917a76 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -286,6 +286,7 @@ extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next);
+extern bbsink *bbsink_zstd_new(bbsink *next);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On Tue, Nov 16, 2021 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
Here's a new patch set.
And here's another one.
I've committed the first two patches from the previous set, the second
of those just today, and so we're getting down to the meat of the
patch set.
0001 adds "server" and "blackhole" as backup targets. It now has some
tests. This might be more or less ready to ship, unless somebody else
sees a problem, or I find one.
0002 adds server-side gzip compression. This one hasn't got tests yet.
Also, it's going to need some adjustment based on the parallel
discussion on the new options structure.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v11-0001-Support-base-backup-targets.patchapplication/octet-stream; name=v11-0001-Support-base-backup-targets.patchDownload
From 3efff1b594d803116a866bdf9aa500e376f02a13 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 16 Nov 2021 15:20:50 -0500
Subject: [PATCH v11 1/2] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/protocol.sgml | 23 +-
doc/src/sgml/ref/pg_basebackup.sgml | 30 ++
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 302 +++++++++++++++++++
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 208 ++++++++++---
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 64 +++-
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
11 files changed, 677 insertions(+), 64 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 7e59edb1cc..cd6dca691e 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2640,8 +2640,27 @@ The commands accepted in replication mode are:
</para>
<para>
- At present, the only supported value for this parameter is
- <literal>client</literal>.
+ If the target is <literal>client</literal>, the backup data is
+ sent to the client. If it is <literal>server</literal>, the backup
+ data is written to the server at the pathname specified by the
+ <literal>TARGET_DETAIL</literal> option. If it is
+ <literal>blackhole</literal>, the backup data is not sent
+ anywhere; it is simply discarded.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>TARGET_DETAIL</literal> <replaceable>'detail'</replaceable></term>
+ <listitem>
+ <para>
+ Provides additional information about the backup target.
+ </para>
+
+ <para>
+ Currently, this option can only be used when the backup target is
+ <literal>server</literal>. It specifies the server directory
+ to which the backup should be written.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..165a9ea5cc 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,36 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">target</replaceable></option></term>
+ <term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Instructs the server where to place the base backup. The default target
+ is <literal>client</literal>, which specifies that the backup should
+ be sent to the machine where <application>pg_basebackup</application>
+ is running. If the target is instead set to
+ <literal>server:/some/path</literal>, the backup will be stored on
+ the machine where the server is running in the
+ <literal>/some/path</literal> directory. Storing a backup on the
+ server requires superuser privileges. If the target is set to
+ <literal>blackhole</literal> causes the contents of the backup to be
+ discarded and not stored anywhere. This should only be used for
+ testing purposes, as you will not end up with an actual backup.
+ </para>
+
+ <para>
+ Since WAL streaming is implemented by
+ <application>pg_basebackup</application> rather than by the server,
+ this option cannot be used together with <literal>-Xstream</literal>.
+ Since that is the default, when this option is specified, you must also
+ specify either <literal>-Xfetch</literal> or <literal>-Xnone</literal>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
<term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3afbbe7e02..d32da51535 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -702,6 +705,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -847,25 +852,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -877,6 +892,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
@@ -908,14 +939,38 @@ SendBaseBackup(BaseBackupCmd *cmd)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt.target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt.target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt.target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt.target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index f42b368c03..60b2d50a5a 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -131,11 +134,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -208,8 +212,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -290,8 +298,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..ce1b7b4797
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,302 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 0f5f18f02e..021b83de7a 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -522,6 +522,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2a58be638a..ec3b4f3c17 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -115,7 +115,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -132,6 +132,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -364,6 +365,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1232,15 +1235,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true, false);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true, false);
+ }
break;
}
@@ -1312,24 +1322,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1698,13 +1716,41 @@ BaseBackup(void)
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1799,8 +1845,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1811,7 +1862,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1894,7 +1946,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2028,8 +2080,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2051,7 +2106,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2085,6 +2140,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2135,7 +2191,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2176,6 +2232,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2308,27 +2367,72 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
+ */
+ if (backup_target != NULL && format != '\0')
+ {
+ pg_log_error("cannot specify both format and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (format == '\0')
+ format = 'p';
+
+ /*
+ * Either directory or backup target should be specified, but not both
*/
- if (basedir == NULL)
+ if (basedir == NULL && backup_target == NULL)
{
- pg_log_error("no target directory specified");
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
/*
- * Mutually exclusive arguments
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
- pg_log_error("only tar mode backups can be compressed");
+ if (backup_target == NULL)
+ pg_log_error("only tar mode backups can be compressed");
+ else
+ pg_log_error("client-side compression is not possible when a backup target is specfied");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal == STREAM_WAL)
+ {
+ pg_log_error("WAL cannot be streamed when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2345,6 +2449,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2378,8 +2485,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2400,6 +2517,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2407,6 +2525,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2416,6 +2537,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2458,11 +2582,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index f0243f28d4..f7e21941eb 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 115;
+use Test::More tests => 135;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -474,6 +474,68 @@ $node->command_ok(
],
'pg_basebackup -X stream runs with --no-slot');
rmtree("$tempdir/backupnoslot");
+$node->command_ok(
+ [ @pg_basebackup_defs, '-D', "$tempdir/backupxf", '-X', 'fetch' ],
+ 'pg_basebackup -X fetch runs');
+
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole' ],
+ qr/WAL cannot be streamed when a backup target is specified/,
+ 'backup target requires -X');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'stream' ],
+ qr/WAL cannot be streamed when a backup target is specified/,
+ 'backup target requires -X other than -X stream');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'bogus', '-X', 'none' ],
+ qr/unrecognized target/,
+ 'backup target unrecognized');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'none', '-D', "$tempdir/blackhole" ],
+ qr/cannot specify both output directory and backup target/,
+ 'backup target and output directory');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'none', '-Ft' ],
+ qr/cannot specify both format and backup target/,
+ 'backup target and output directory');
+$node->command_ok(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'none' ],
+ 'backup target blackhole');
+$node->command_ok(
+ [ @pg_basebackup_defs, '--target', "server:$tempdir/backuponserver", '-X', 'none' ],
+ 'backup target server');
+ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created');
+rmtree("$tempdir/backuponserver");
+
+$node->command_fails(
+ [
+ @pg_basebackup_defs, '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+
+$node->command_fails(
+ [ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C' ],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ @pg_basebackup_defs, '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-D', "$tempdir/blackhole" ],
+ qr/cannot specify both output directory and backup target/,
+ 'backup target and output directory');
+
+$node->command_ok(
+ [ @pg_basebackup_defs, '-D', "$tempdir/backuptr/co", '-X', 'none' ],
+ 'pg_basebackup -X fetch runs');
$node->command_fails(
[
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 25436defa8..4acadf406d 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,9 +282,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index e0b2f56f47..395d325c5f 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -157,6 +157,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
--
2.24.3 (Apple Git-128)
v11-0002-Server-side-gzip-compression.patchapplication/octet-stream; name=v11-0002-Server-side-gzip-compression.patchDownload
From bf4a54d2ffc6ea55f5927b7fe8df4caef80d25aa Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 5 Nov 2021 10:05:02 -0400
Subject: [PATCH v11 2/2] Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++-
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 304 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 43 ++-
src/include/replication/basebackup_sink.h | 1 +
7 files changed, 415 insertions(+), 4 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 165a9ea5cc..9ce8b8d89d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,31 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--server-compression=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Allows the tar files generated for each tablespace to be compressed
+ on the server, before they are sent to the client. The default value
+ is <literal>none</literal>, which performs no compression. If set
+ to <literal>gzip</literal>, compression is performed using gzip and
+ the suffix <filename>.gz</filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9</literal>
+ will provide the maximum compression that the <literal>gzip</literal>
+ algorithm can provide.
+ </para>
+ <para>
+ Since the write-ahead logs are fetched via a separate client
+ connection, they cannot be compressed using this option. See also
+ the <literal>--gzip</literal> and <literal>--compress</literal>
+ options.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-t <replaceable class="parameter">target</replaceable></option></term>
<term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
@@ -405,7 +430,9 @@ PostgreSQL documentation
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ automatically be added to all tar filenames. When this option is
+ used, compression is performed on the client side;
+ see also <literal>--server-compression</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/Makefile b/src/backend/Makefile
index add9560be4..4a02006788 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d32da51535..4bed0f18b7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -707,11 +715,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -881,6 +891,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -975,6 +1010,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt.compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt.compression_level);
+
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..432423bd55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,304 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index ec3b4f3c17..6ee49a5672 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -139,6 +139,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -373,13 +374,15 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
+ printf(_(" --server-compression=none|gzip|gzip[1-9]\n"
+ " compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
- printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -z, --gzip compress tar output on client\n"));
+ printf(_(" -Z, --compress=0-9 compress tar output on client with given compression level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -999,7 +1002,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1008,13 +1013,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
+ * However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1754,6 +1778,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2164,6 +2199,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2343,6 +2379,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 4acadf406d..d3276b2487 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
On Tue, Jan 18, 2022 at 9:43 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
The patch surely needs some grooming, but I am expecting some initial
review, specially in the area where we are trying to close the zstd stream
in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
compression by calling ZSTD_compressStream2() thereby sending a
ZSTD_e_end flag. But, this also needs some input string, which per
example[1] line # 686, I have taken as an empty ZSTD_inBuffer.
As far as I can see, this is correct. I found
https://zstd.docsforge.com/dev/api-documentation/#streaming-compression-howto
which seems to endorse what you've done here.
One (minor) thing that I notice is that, the way you've written the
loop in bbsink_zstd_end_archive(), I think it will typically call
bbsink_archive_contents() twice. It will flush whatever is already
present in the next sink's buffer as a result of the previous calls to
bbsink_zstd_archive_contents(), and then it will call
ZSTD_compressStream2() which will partially refill the buffer you just
emptied, and then there will be nothing left in the internal buffer,
so it will call bbsink_archive_contents() again. But ... the initial
flush may not have been necessary. It could be that there was enough
space already in the output buffer for the ZSTD_compressStream2() call
to succeed without a prior flush. So maybe:
do
{
yet_to_flush = ZSTD_compressStream2(..., ZSTD_e_end);
check ZSTD_isError here;
if (mysink->zstd_outBuf.pos > 0)
bbsink_archive_contents();
} while (yet_to_flush > 0);
I believe this might be very slightly more efficient.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
I have added support for decompressing a gzip compressed tar file
at client. pg_basebackup can enable server side compression for
plain format backup with this change.
Added a gzip extractor which decompresses the compressed archive
and forwards it to the next streamer. I have done initial testing and
working on updating the test coverage.
Note: Before applying the patch, please apply Robert's v11 version
of the patches 0001 and 0002.
Thanks,
Dipesh
Attachments:
v1-0001-Support-for-extracting-gzip-compressed-archive.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Support-for-extracting-gzip-compressed-archive.patchDownload
From 737badce26ed05b5cdb64d9ffd1735fef9acbbf8 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Wed, 19 Jan 2022 17:11:45 +0530
Subject: [PATCH] Support for extracting gzip compressed archive
pg_basebackup can support server side compression using gzip. In
order to support plain format backup with option '-Fp' we need to
add support for decompressing the compressed blocks at client. This
patch addresses the extraction of gzip compressed blocks at client.
---
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_file.c | 175 ++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 58 +++++++++--
3 files changed, 225 insertions(+), 9 deletions(-)
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50..270b0df 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
+extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca222..350af1d 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -37,6 +37,13 @@ typedef struct bbstreamer_gzip_writer
char *pathname;
gzFile gzfile;
} bbstreamer_gzip_writer;
+
+typedef struct bbstreamer_gzip_extractor
+{
+ bbstreamer base;
+ z_stream zstream;
+ size_t bytes_written;
+} bbstreamer_gzip_extractor;
#endif
typedef struct bbstreamer_extractor
@@ -76,6 +83,21 @@ const bbstreamer_ops bbstreamer_gzip_writer_ops = {
.finalize = bbstreamer_gzip_writer_finalize,
.free = bbstreamer_gzip_writer_free
};
+
+static void bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_extractor_free(bbstreamer *streamer);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbstreamer_ops bbstreamer_gzip_extractor_ops = {
+ .content = bbstreamer_gzip_extractor_content,
+ .finalize = bbstreamer_gzip_extractor_finalize,
+ .free = bbstreamer_gzip_extractor_free
+};
#endif
static void bbstreamer_extractor_content(bbstreamer *streamer,
@@ -349,6 +371,159 @@ get_gz_error(gzFile gzf)
#endif
/*
+ * Create a new base backup streamer that performs decompression of gzip
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_gzip_extractor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_extractor *streamer;
+ z_stream *zs;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_extractor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ zs = &streamer->zstream;
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) streamer->base.bbs_buffer.data;
+ zs->avail_out = streamer->base.bbs_buffer.maxlen;
+
+ /*
+ * Data compression was initialized using deflateInit2 to request a gzip
+ * header. Similarly, we are using inflateInit2 to initialize data
+ * decompression.
+ * "windowBits" must be greater than or equal to "windowBits" value
+ * provided to deflateInit2 while compressing.
+ */
+ if (inflateInit2(zs, 15 + 16) != Z_OK)
+ {
+ pg_log_error("could not initialize compression library");
+ exit(1);
+
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Decompress the input data to output buffer until we ran out of the input
+ * data. Each time the output buffer is full invoke bbstreamer_content to pass
+ * on the decompressed data to next streamer.
+ */
+static void
+bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+ z_stream *zs = &mystreamer->zstream;
+ int res;
+
+
+ zs->next_in = (uint8 *) data;
+ zs->avail_in = len;
+
+ /* Process the current chunk */
+ while (zs->avail_in > 0)
+ {
+ Assert(mystreamer->bytes_written < mystreamer->base.bbs_buffer.maxlen);
+
+ zs->next_out = (uint8 *)
+ mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ zs->avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * Decompresses data starting at zs->next_in and update zs->next_in
+ * and zs->avail_in, generate output data starting at zs->next_out
+ * and update zs->next_out and zs->avail_out accordingly.
+ */
+ res = inflate(zs, Z_NO_FLUSH);
+
+ if (res == Z_STREAM_ERROR)
+ pg_log_error("could not decompress data: %s", zs->msg);
+
+ mystreamer->bytes_written = mystreamer->base.bbs_buffer.maxlen - zs->avail_out;
+
+ /* If output buffer is full then pass on the content to next streamer */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member, mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
+ mystreamer->bytes_written = 0;
+ }
+ }
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ if (res == Z_STREAM_END) {
+ bbstreamer_content(mystreamer->base.bbs_next, member, mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written, context);
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_gzip_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_gzip_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ bbstreamer_free(mystreamer->base.bbs_next);
+ pfree(mystreamer->base.bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+#endif
+
+/*
* Create a bbstreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 6ee49a5..b5e31aa 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -111,6 +111,12 @@ typedef enum
STREAM_WAL
} IncludeWal;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} compression_type;
+
/* Global options */
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
@@ -173,6 +179,10 @@ static int has_xlogendptr = 0;
static volatile LONG has_xlogendptr = 0;
#endif
+/* Server side compression method and compression level */
+static compression_type server_compression_type = BACKUP_COMPRESSION_NONE;
+static int server_compression_level = 0;
+
/* Contents of configuration file to be generated */
static PQExpBuffer recoveryconfcontents = NULL;
@@ -1028,15 +1038,23 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
/* At present, we only know how to parse tar archives. */
if (must_parse_archive && !is_tar)
{
- pg_log_error("unable to parse archive: %s", archive_name);
- pg_log_info("only tar archives can be parsed");
- if (format == 'p')
- pg_log_info("plain format requires pg_basebackup to parse the archive");
- if (inject_manifest)
- pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
- if (writerecoveryconf)
- pg_log_info("the -R option requires pg_basebackup to parse the archive");
- exit(1);
+ /*
+ * If the archived is compressed using a compression method other than
+ * gzip then we don't know how to extract it.
+ */
+ if (server_compression != NULL &&
+ server_compression_type != BACKUP_COMPRESSION_GZIP)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
}
if (format == 'p')
@@ -1136,6 +1154,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
+ /*
+ * Extract the gzip compressed archive using a gzip extractor and then
+ * forward it to next streamer.
+ */
+ if (format == 'p' && server_compression_type == BACKUP_COMPRESSION_GZIP)
+ streamer = bbstreamer_gzip_extractor_new(streamer);
+
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
return streamer;
@@ -2448,6 +2473,21 @@ main(int argc, char **argv)
exit(1);
}
+ if (server_compression != NULL)
+ {
+ if (strcmp(server_compression, "gzip") == 0)
+ server_compression_type = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(server_compression) == 5 &&
+ strncmp(server_compression, "gzip", 4) == 0 &&
+ server_compression[4] >= '1' && server_compression[4] <= '9')
+ {
+ server_compression_type = BACKUP_COMPRESSION_GZIP;
+ server_compression_level = server_compression[4] - '0';
+ }
+ }
+ else
+ server_compression_type = BACKUP_COMPRESSION_NONE;
+
/*
* Compression doesn't make sense unless tar format is in use.
*/
--
1.8.3.1
On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
I have added support for decompressing a gzip compressed tar file
at client. pg_basebackup can enable server side compression for
plain format backup with this change.Added a gzip extractor which decompresses the compressed archive
and forwards it to the next streamer. I have done initial testing and
working on updating the test coverage.
Cool. It's going to need some documentation changes, too.
I don't like the way you coded this in CreateBackupStreamer(). I would
like the decision about whether to use
bbstreamer_gzip_extractor_new(), and/or throw an error about not being
able to parse an archive, to based on the file type i.e. "did we get a
.tar.gz file?" rather than on whether we asked for server-side
compression. Notice that the existing logic checks whether we actually
got a .tar file from the server rather than assuming that's what must
have happened.
As a matter of style, I don't think it's good for the only thing
inside of an "if" statement to be another "if" statement. The two
could be merged, but we also don't want to have the "if" conditional
be too complex. I am imagining that this should end up saying
something like if (must_parse_archive && !is_tar && !is_tar_gz) {
pg_log_error(...
+ * "windowBits" must be greater than or equal to "windowBits" value
+ * provided to deflateInit2 while compressing.
It would be nice to clarify why we know the value we're using is safe.
Maybe we're using the maximum possible value, in which case you could
just add that to the end of the comment: "...so we use the maximum
possible value for safety."
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ if (res == Z_STREAM_END) {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written, context);
+ }
Uncuddle the brace.
It probably doesn't make much difference, but I would be inclined to
do the final flush in bbstreamer_gzip_extractor_finalize() rather than
here. That way we rely on our own notion of when there's no more input
data rather than zlib's notion. Probably terrible things are going to
happen if those two ideas don't match up .... but there might be some
other compression algorithm that doesn't return a distinguishing code
at end-of-stream. Such an algorithm would have to take care of any
leftover data in the finalize function, so I think we should do that
here too, so the code can be similar in all cases.
Perhaps we should move all the gzip stuff to a new file bbstreamer_gzip.c.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 1/18/22 8:12 PM, Jeevan Ladhe wrote:
Similar to LZ4 server-side compression, I have also tried to add a ZSTD
server-side compression in the attached patch.
Thanks Jeevan. while testing found one scenario where the server is
getting crash while performing pg_basebackup
against server-compression=zstd for a huge data second time
Steps to reproduce
--PG sources ( apply v11-0001,v11-0001,v9-0001,v9-0002 , configure
--with-lz4,--with-zstd, make/install, initdb, start server)
--insert huge data (./pgbench -i -s 2000 postgres)
--restart the server (./pg_ctl -D data restart)
--pg_basebackup ( ./pg_basebackup -t server:/tmp/yc1
--server-compression=zstd -R -Xnone -n -N -l 'ccc' --no-estimate-size -v)
--insert huge data (./pgbench -i -s 1000 postgres)
--restart the server (./pg_ctl -D data restart)
--run pg_basebackup again (./pg_basebackup -t server:/tmp/yc11
--server-compression=zstd -v -Xnone )
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/yc11
--server-compression=zstd -v -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:23:26.508 IST [30219] LOG: checkpoint starting: force wait
2022-01-19 21:23:26.608 IST [30219] LOG: checkpoint complete: wrote 0
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001
s, sync=0.001 s, total=0.101 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=16369 kB, estimate=16369 kB
pg_basebackup: checkpoint completed
TRAP: FailedAssertion("len > 0 && len <= sink->bbs_buffer_length", File:
"../../../src/include/replication/basebackup_sink.h", Line: 208, PID: 30226)
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(ExceptionalCondition+0x7a)[0x94ceca]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x7b9a08]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x7b9be2]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x7b5b30]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(SendBaseBackup+0x563)[0x7b7053]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(exec_replication_command+0x961)[0x7c9a41]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(PostgresMain+0x92f)[0x81ca3f]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x48e430]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(PostmasterMain+0xfd2)[0x785702]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(main+0x1c6)[0x48fb96]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63642c8555]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x48feb5]
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2022-01-19 21:25:34.485 IST [30205] LOG: server process (PID 30226) was
terminated by signal 6: Aborted
2022-01-19 21:25:34.485 IST [30205] DETAIL: Failed process was running:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TABLESPACE_MAP, TARGET 'server', TARGET_DETAIL '/tmp/yc11',
COMPRESSION 'zstd')
2022-01-19 21:25:34.485 IST [30205] LOG: terminating any other active
server processes
[edb@centos7tushar bin]$ 2022-01-19 21:25:34.489 IST [30205] LOG: all
server processes terminated; reinitializing
2022-01-19 21:25:34.536 IST [30228] LOG: database system was
interrupted; last known up at 2022-01-19 21:23:26 IST
2022-01-19 21:25:34.669 IST [30228] LOG: database system was not
properly shut down; automatic recovery in progress
2022-01-19 21:25:34.671 IST [30228] LOG: redo starts at 9/7000028
2022-01-19 21:25:34.671 IST [30228] LOG: invalid record length at
9/7000148: wanted 24, got 0
2022-01-19 21:25:34.671 IST [30228] LOG: redo done at 9/7000110 system
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-01-19 21:25:34.673 IST [30229] LOG: checkpoint starting:
end-of-recovery immediate wait
2022-01-19 21:25:34.713 IST [30229] LOG: checkpoint complete: wrote 3
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003
s, sync=0.001 s, total=0.041 s; sync files=2, longest=0.001 s,
average=0.001 s; distance=0 kB, estimate=0 kB
2022-01-19 21:25:34.718 IST [30205] LOG: database system is ready to
accept connections
Observation -
if we change server-compression method to lz4 from zstd then it is NOT
happening.
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/ycc1
--server-compression=lz4 -v -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:27:51.642 IST [30229] LOG: checkpoint starting: force wait
2022-01-19 21:27:51.687 IST [30229] LOG: checkpoint complete: wrote 0
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001
s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=16383 kB, estimate=16383 kB
pg_basebackup: checkpoint completed
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: base backup completed
[edb@centos7tushar bin]$
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
I have done initial testing and
working on updating the test coverage.
I spent some time thinking about test coverage for the server-side
backup code today and came up with the attached (v12-0003). It does an
end-to-end test that exercises server-side backup and server-side
compression and then untars the backup and validity-checks it using
pg_verifybackup. In addition to being good test coverage for these
patches, it also plugs a gap in the test coverage of pg_verifybackup,
which currently has no test case that untars a tar-format backup and
then verifies the result. I couldn't figure out a way to do that back
at the time I was working on pg_verifybackup, because I didn't think
we had any existing precedent for using 'tar' from a TAP test. But it
was pointed out to me that we do, so I used that as the model for this
test. It should be easy to generalize this test case to test lz4 and
zstd as well, I think. But I guess we'll still need something
different to test what your patch is doing.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v12-0002-Server-side-gzip-compression.patchapplication/octet-stream; name=v12-0002-Server-side-gzip-compression.patchDownload
From 6c0f223b3c2265fe59400aa6842424bc9717f601 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 5 Nov 2021 10:05:02 -0400
Subject: [PATCH v12 2/3] Server-side gzip compression.
pg_basebackup now has a --server-compression option, which can be
set to 'none' (the default), 'gzip', or 'gzipN' where N is a digit
between 1 and 9. If set to 'gzip' or 'gzipN' it will compress the
generated tar files on the server side using 'gzip', either at the
default compression level or a the compression level specified by N.
At present, pg_basebackup cannot decompress .gz files, so the
--server-compression option will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/ref/pg_basebackup.sgml | 29 ++-
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 39 +++
src/backend/replication/basebackup_gzip.c | 304 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 43 ++-
src/include/replication/basebackup_sink.h | 1 +
7 files changed, 415 insertions(+), 4 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 165a9ea5cc..9ce8b8d89d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,31 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--server-compression=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Allows the tar files generated for each tablespace to be compressed
+ on the server, before they are sent to the client. The default value
+ is <literal>none</literal>, which performs no compression. If set
+ to <literal>gzip</literal>, compression is performed using gzip and
+ the suffix <filename>.gz</filename> will automatically be added to
+ compressed files. A numeric digit between 1 and 9 can be added to
+ specify the compression level; for instance, <literal>gzip9</literal>
+ will provide the maximum compression that the <literal>gzip</literal>
+ algorithm can provide.
+ </para>
+ <para>
+ Since the write-ahead logs are fetched via a separate client
+ connection, they cannot be compressed using this option. See also
+ the <literal>--gzip</literal> and <literal>--compress</literal>
+ options.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-t <replaceable class="parameter">target</replaceable></option></term>
<term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
@@ -405,7 +430,9 @@ PostgreSQL documentation
compression level (0 through 9, 0 being no compression and 9 being best
compression). Compression is only available when using the tar
format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ automatically be added to all tar filenames. When this option is
+ used, compression is performed on the client side;
+ see also <literal>--server-compression</literal>.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/Makefile b/src/backend/Makefile
index add9560be4..4a02006788 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d32da51535..4bed0f18b7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -707,11 +715,13 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -881,6 +891,31 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(optval) == 5 && strncmp(optval, "gzip", 4) == 0 &&
+ optval[4] >= '1' && optval[4] <= '9')
+ {
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ opt->compression_level = optval[4] - '0';
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -975,6 +1010,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt.compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt.compression_level);
+
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..432423bd55
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,304 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index ec3b4f3c17..6ee49a5672 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -139,6 +139,7 @@ static bool verify_checksums = true;
static bool manifest = true;
static bool manifest_force_encode = false;
static char *manifest_checksums = NULL;
+static char *server_compression = NULL;
static bool success = false;
static bool made_new_pgdata = false;
@@ -373,13 +374,15 @@ usage(void)
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
printf(_(" -R, --write-recovery-conf\n"
" write configuration for replication\n"));
+ printf(_(" --server-compression=none|gzip|gzip[1-9]\n"
+ " compress backup on server\n"));
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
" relocate tablespace in OLDDIR to NEWDIR\n"));
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
- printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n"));
+ printf(_(" -z, --gzip compress tar output on client\n"));
+ printf(_(" -Z, --compress=0-9 compress tar output on client with given compression level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -999,7 +1002,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1008,13 +1013,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
+ * However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1754,6 +1778,17 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (server_compression != NULL)
+ {
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", server_compression);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2164,6 +2199,7 @@ main(int argc, char **argv)
{"no-manifest", no_argument, NULL, 5},
{"manifest-force-encode", no_argument, NULL, 6},
{"manifest-checksums", required_argument, NULL, 7},
+ {"server-compression", required_argument, NULL, 8},
{NULL, 0, NULL, 0}
};
int c;
@@ -2343,6 +2379,9 @@ main(int argc, char **argv)
case 7:
manifest_checksums = pg_strdup(optarg);
break;
+ case 8:
+ server_compression = pg_strdup(optarg);
+ break;
default:
/*
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 4acadf406d..d3276b2487 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
v12-0003-Test-server-side-backup-backup-compression-and-p.patchapplication/octet-stream; name=v12-0003-Test-server-side-backup-backup-compression-and-p.patchDownload
From 4054ad2eb75f18e7f1349a7db6b6e5828c320d63 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 19 Jan 2022 15:37:27 -0500
Subject: [PATCH v12 3/3] Test server-side backup, backup compression, and
pg_verifybackup.
---
src/bin/pg_verifybackup/Makefile | 7 ++
src/bin/pg_verifybackup/t/008_untar.pl | 104 +++++++++++++++++++++++++
2 files changed, 111 insertions(+)
create mode 100644 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index c07643b129..1ae818f9a1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -3,6 +3,13 @@
PGFILEDESC = "pg_verifybackup - verify a backup against using a backup manifest"
PGAPPICON = win32
+# make these available to TAP test scripts
+export TAR
+# Note that GZIP cannot be used directly as this environment variable is
+# used by the command "gzip" to pass down options, so stick with a different
+# name.
+export GZIP_PROGRAM=$(GZIP)
+
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
new file mode 100644
index 0000000000..85946cf380
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -0,0 +1,104 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test case aims to verify that server-side backups and server-side
+# backup compression work properly, and it also aims to verify that
+# pg_verifybackup can verify a base backup that didn't start out in plain
+# format.
+
+use strict;
+use warnings;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 6;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my $have_zlib = check_pg_config("#define HAVE_LIBZ 1");
+my $backup_path = $primary->backup_dir . '/server-backup';
+my $extract_path = $primary->backup_dir . '/extracted-backup';
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'backup_archive' => 'base.tar',
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--server-compress', 'gzip'],
+ 'backup_archive' => 'base.tar.gz',
+ 'decompress_program' => $ENV{'GZIP_PROGRAM'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+ skip "no decompressor available for $method", 3
+ if exists $tc->{'decompress_program'} &&
+ !defined $tc->{'decompress_program'};
+
+ # Take a server-side backup.
+ my @backup = (
+ 'pg_basebackup', '--no-sync', '-cfast', '--target',
+ "server:$backup_path", '-Xfetch'
+ );
+ push @backup, @{$tc->{'backup_flags'}};
+ $primary->command_ok(\@backup,
+ "server side backup, compression $method");
+
+
+ # Verify that the we got the files we expected.
+ my $backup_files = join(',',
+ sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
+ my $expected_backup_files = join(',',
+ sort ('backup_manifest', $tc->{'backup_archive'}));
+ is($backup_files,$expected_backup_files,
+ "found expected backup files, compression $method");
+
+ # Decompress.
+ if (exists $tc->{'decompress_program'})
+ {
+ my @decompress = ($tc->{'decompress_program'});
+ push @decompress, @{$tc->{'decompress_flags'}}
+ if $tc->{'decompress_flags'};
+ push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ system_or_bail(@decompress);
+ }
+
+ SKIP: {
+ my $tar = $ENV{TAR};
+ # don't check for a working tar here, to accomodate various odd
+ # cases such as AIX. If tar doesn't work the init_from_backup below
+ # will fail.
+ skip "no tar program available", 1
+ if (!defined $tar || $tar eq '');
+
+ # Untar.
+ mkdir($extract_path);
+ system_or_bail($tar, 'xf', $backup_path . '/base.tar',
+ '-C', $extract_path);
+
+ # Verify.
+ $primary->command_ok([ 'pg_verifybackup', '-n',
+ '-m', "$backup_path/backup_manifest", '-e', $extract_path ],
+ "verify backup, compression $method");
+ }
+
+ # Cleanup.
+ unlink($backup_path . '/backup_manifest');
+ unlink($backup_path . '/base.tar');
+ rmtree($extract_path);
+ }
+}
--
2.24.3 (Apple Git-128)
v12-0001-Support-base-backup-targets.patchapplication/octet-stream; name=v12-0001-Support-base-backup-targets.patchDownload
From 21dbbdf9ee7125f98aff2deb4ebcf401b6414dad Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 16 Nov 2021 15:20:50 -0500
Subject: [PATCH v12 1/3] Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets,
at least for now, must use -Xnone.
Patch by me, with a bug fix by Jeevan Ladhe.
---
doc/src/sgml/protocol.sgml | 23 +-
doc/src/sgml/ref/pg_basebackup.sgml | 30 ++
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 81 ++++-
src/backend/replication/basebackup_copy.c | 21 +-
src/backend/replication/basebackup_server.c | 302 +++++++++++++++++++
src/backend/utils/activity/wait_event.c | 6 +
src/bin/pg_basebackup/pg_basebackup.c | 208 ++++++++++---
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 64 +++-
src/include/replication/basebackup_sink.h | 3 +-
src/include/utils/wait_event.h | 2 +
11 files changed, 677 insertions(+), 64 deletions(-)
create mode 100644 src/backend/replication/basebackup_server.c
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 7e59edb1cc..cd6dca691e 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2640,8 +2640,27 @@ The commands accepted in replication mode are:
</para>
<para>
- At present, the only supported value for this parameter is
- <literal>client</literal>.
+ If the target is <literal>client</literal>, the backup data is
+ sent to the client. If it is <literal>server</literal>, the backup
+ data is written to the server at the pathname specified by the
+ <literal>TARGET_DETAIL</literal> option. If it is
+ <literal>blackhole</literal>, the backup data is not sent
+ anywhere; it is simply discarded.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>TARGET_DETAIL</literal> <replaceable>'detail'</replaceable></term>
+ <listitem>
+ <para>
+ Provides additional information about the backup target.
+ </para>
+
+ <para>
+ Currently, this option can only be used when the backup target is
+ <literal>server</literal>. It specifies the server directory
+ to which the backup should be written.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 9e6807b457..165a9ea5cc 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -224,6 +224,36 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">target</replaceable></option></term>
+ <term><option>--target=<replaceable class="parameter">target</replaceable></option></term>
+ <listitem>
+
+ <para>
+ Instructs the server where to place the base backup. The default target
+ is <literal>client</literal>, which specifies that the backup should
+ be sent to the machine where <application>pg_basebackup</application>
+ is running. If the target is instead set to
+ <literal>server:/some/path</literal>, the backup will be stored on
+ the machine where the server is running in the
+ <literal>/some/path</literal> directory. Storing a backup on the
+ server requires superuser privileges. If the target is set to
+ <literal>blackhole</literal> causes the contents of the backup to be
+ discarded and not stored anywhere. This should only be used for
+ testing purposes, as you will not end up with an actual backup.
+ </para>
+
+ <para>
+ Since WAL streaming is implemented by
+ <application>pg_basebackup</application> rather than by the server,
+ this option cannot be used together with <literal>-Xstream</literal>.
+ Since that is the default, when this option is specified, you must also
+ specify either <literal>-Xfetch</literal> or <literal>-Xnone</literal>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
<term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74b97cf126..a8f4757f0c 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_progress.o \
+ basebackup_server.o \
basebackup_sink.o \
basebackup_throttle.o \
repl_gram.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3afbbe7e02..d32da51535 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -55,8 +55,10 @@
typedef enum
{
+ BACKUP_TARGET_BLACKHOLE,
BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT
+ BACKUP_TARGET_CLIENT,
+ BACKUP_TARGET_SERVER
} backup_target_type;
typedef struct
@@ -69,6 +71,7 @@ typedef struct
uint32 maxrate;
bool sendtblspcmapfile;
backup_target_type target;
+ char *target_detail;
backup_manifest_option manifest;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -702,6 +705,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest = false;
bool o_manifest_checksums = false;
bool o_target = false;
+ bool o_target_detail = false;
+ char *target_str = "compat"; /* placate compiler */
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
@@ -847,25 +852,35 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- char *optval = defGetString(defel);
+ target_str = defGetString(defel);
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "client") == 0)
+ if (strcmp(target_str, "blackhole") == 0)
+ opt->target = BACKUP_TARGET_BLACKHOLE;
+ else if (strcmp(target_str, "client") == 0)
opt->target = BACKUP_TARGET_CLIENT;
+ else if (strcmp(target_str, "server") == 0)
+ opt->target = BACKUP_TARGET_SERVER;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", optval)));
+ errmsg("unrecognized target: \"%s\"", target_str)));
o_target = true;
}
- else
- ereport(ERROR,
- errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname));
+ else if (strcmp(defel->defname, "target_detail") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_target_detail)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->target_detail = optval;
+ o_target_detail = true;
+ }
}
if (opt->label == NULL)
opt->label = "base backup";
@@ -877,6 +892,22 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
+ if (opt->target == BACKUP_TARGET_SERVER)
+ {
+ if (opt->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target_str)));
+ }
+ else
+ {
+ if (opt->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target_str)));
+ }
}
@@ -908,14 +939,38 @@ SendBaseBackup(BaseBackupCmd *cmd)
/*
* If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If not, we must fall back to the old and less capable
- * copy-tablespace protocol.
+ * protocol. If the target is specifically 'client' then set up to stream
+ * the backup to the client; otherwise, it's being sent someplace else and
+ * should not be sent to the client.
+ *
+ * If the TARGET option was not specified, we must fall back to the older
+ * and less capable copy-tablespace protocol.
*/
- if (opt.target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new();
+ if (opt.target == BACKUP_TARGET_CLIENT)
+ sink = bbsink_copystream_new(true);
+ else if (opt.target != BACKUP_TARGET_COMPAT)
+ sink = bbsink_copystream_new(false);
else
sink = bbsink_copytblspc_new();
+ /*
+ * If a non-default backup target is in use, arrange to send the data
+ * wherever it needs to go.
+ */
+ switch (opt.target)
+ {
+ case BACKUP_TARGET_BLACKHOLE:
+ /* Nothing to do, just discard data. */
+ break;
+ case BACKUP_TARGET_COMPAT:
+ case BACKUP_TARGET_CLIENT:
+ /* Nothing to do, handling above is sufficient. */
+ break;
+ case BACKUP_TARGET_SERVER:
+ sink = bbsink_server_new(sink, opt.target_detail);
+ break;
+ }
+
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
diff --git a/src/backend/replication/basebackup_copy.c b/src/backend/replication/basebackup_copy.c
index 8dfdff0644..938e19a6a4 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -44,6 +44,9 @@ typedef struct bbsink_copystream
/* Common information for all types of sink. */
bbsink base;
+ /* Are we sending the archives to the client, or somewhere else? */
+ bool send_to_client;
+
/*
* Protocol message buffer. We assemble CopyData protocol messages by
* setting the first character of this buffer to 'd' (archive or manifest
@@ -131,11 +134,12 @@ const bbsink_ops bbsink_copytblspc_ops = {
* Create a new 'copystream' bbsink.
*/
bbsink *
-bbsink_copystream_new(void)
+bbsink_copystream_new(bool send_to_client)
{
bbsink_copystream *sink = palloc0(sizeof(bbsink_copystream));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_copystream_ops;
+ sink->send_to_client = send_to_client;
/* Set up for periodic progress reporting. */
sink->last_progress_report_time = GetCurrentTimestamp();
@@ -212,8 +216,12 @@ bbsink_copystream_archive_contents(bbsink *sink, size_t len)
StringInfoData buf;
uint64 targetbytes;
- /* Send the archive content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ /* Send the archive content to the client, if appropriate. */
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
/* Consider whether to send a progress report to the client. */
targetbytes = mysink->bytes_done_at_last_time_check
@@ -294,8 +302,11 @@ bbsink_copystream_manifest_contents(bbsink *sink, size_t len)
{
bbsink_copystream *mysink = (bbsink_copystream *) sink;
- /* Send the manifest content to the client (with leading type byte). */
- pq_putmessage('d', mysink->msgbuffer, len + 1);
+ if (mysink->send_to_client)
+ {
+ /* Add one because we're also sending a leading type byte. */
+ pq_putmessage('d', mysink->msgbuffer, len + 1);
+ }
}
/*
diff --git a/src/backend/replication/basebackup_server.c b/src/backend/replication/basebackup_server.c
new file mode 100644
index 0000000000..ce1b7b4797
--- /dev/null
+++ b/src/backend/replication/basebackup_server.c
@@ -0,0 +1,302 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_server.c
+ * store basebackup archives on the server
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_server.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "replication/basebackup.h"
+#include "replication/basebackup_sink.h"
+#include "storage/fd.h"
+#include "utils/timestamp.h"
+#include "utils/wait_event.h"
+
+typedef struct bbsink_server
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Directory in which backup is to be stored. */
+ char *pathname;
+
+ /* Currently open file (or 0 if nothing open). */
+ File file;
+
+ /* Current file position. */
+ off_t filepos;
+} bbsink_server;
+
+static void bbsink_server_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_server_archive_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_archive(bbsink *sink);
+static void bbsink_server_begin_manifest(bbsink *sink);
+static void bbsink_server_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_server_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_server_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_server_begin_archive,
+ .archive_contents = bbsink_server_archive_contents,
+ .end_archive = bbsink_server_end_archive,
+ .begin_manifest = bbsink_server_begin_manifest,
+ .manifest_contents = bbsink_server_manifest_contents,
+ .end_manifest = bbsink_server_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+/*
+ * Create a new 'server' bbsink.
+ */
+bbsink *
+bbsink_server_new(bbsink *next, char *pathname)
+{
+ bbsink_server *sink = palloc0(sizeof(bbsink_server));
+
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_server_ops;
+ sink->pathname = pathname;
+ sink->base.bbs_next = next;
+
+ /* Replication permission is not sufficient in this case. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to create server backup")));
+
+ /*
+ * It's not a good idea to store your backups in the same directory that
+ * you're backing up. If we allowed a relative path here, that could easily
+ * happen accidentally, so we don't. The user could still accomplish the
+ * same thing by including the absolute path to $PGDATA in the pathname,
+ * but that's likely an intentional bad decision rather than an accident.
+ */
+ if (!is_absolute_path(pathname))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for server backup")));
+
+ switch (pg_check_dir(pathname))
+ {
+ case 0:
+ /*
+ * Does not exist, so create it using the same permissions we'd use
+ * for a new subdirectory of the data directory itself.
+ */
+ if (MakePGDirectory(pathname) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", pathname)));
+ break;
+
+ case 1:
+ /* Exists, empty. */
+ break;
+
+ case 2:
+ case 3:
+ case 4:
+ /* Exists, not empty. */
+ ereport(ERROR,
+ (errcode(ERRCODE_DUPLICATE_FILE),
+ errmsg("directory \"%s\" exists but is not empty",
+ pathname)));
+ break;
+
+ default:
+ /* Access problem. */
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not access directory \"%s\": %m",
+ pathname)));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Open the correct output file for this archive.
+ */
+static void
+bbsink_server_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *filename;
+
+ Assert(mysink->file == 0);
+ Assert(mysink->filepos == 0);
+
+ filename = psprintf("%s/%s", mysink->pathname, archive_name);
+
+ mysink->file = PathNameOpenFile(filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", filename)));
+
+ pfree(filename);
+
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Write the data to the output file.
+ */
+static void
+bbsink_server_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * fsync and close the current output file.
+ */
+static void
+bbsink_server_end_archive(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+
+ /*
+ * We intentionally don't use data_sync_elevel here, because the server
+ * shouldn't PANIC just because we can't guarantee the the backup has been
+ * written down to disk. Running recovery won't fix anything in this case
+ * anyway.
+ */
+ if (FileSync(mysink->file, WAIT_EVENT_BASEBACKUP_SYNC) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+ FilePathName(mysink->file))));
+
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+ mysink->filepos = 0;
+
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Open the output file to which we will write the manifest.
+ *
+ * Just like pg_basebackup, we write the manifest first under a temporary
+ * name and then rename it into place after fsync. That way, if the manifest
+ * is there and under the correct name, the user can be sure that the backup
+ * completed.
+ */
+static void
+bbsink_server_begin_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+
+ Assert(mysink->file == 0);
+
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+
+ mysink->file = PathNameOpenFile(tmp_filename,
+ O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+ if (mysink->file <= 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", tmp_filename)));
+
+ pfree(tmp_filename);
+
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Each chunk of manifest data is sent using a CopyData message.
+ */
+static void
+bbsink_server_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ int nbytes;
+
+ nbytes = FileWrite(mysink->file, mysink->base.bbs_buffer, len,
+ mysink->filepos, WAIT_EVENT_BASEBACKUP_WRITE);
+
+ if (nbytes != len)
+ {
+ if (nbytes < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write file \"%s\": %m",
+ FilePathName(mysink->file)),
+ errhint("Check free disk space.")));
+ /* short write: complain appropriately */
+ ereport(ERROR,
+ (errcode(ERRCODE_DISK_FULL),
+ errmsg("could not write file \"%s\": wrote only %d of %d bytes at offset %u",
+ FilePathName(mysink->file),
+ nbytes, (int) len, (unsigned) mysink->filepos),
+ errhint("Check free disk space.")));
+ }
+
+ mysink->filepos += nbytes;
+
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * fsync the backup manifest, close the file, and then rename it into place.
+ */
+static void
+bbsink_server_end_manifest(bbsink *sink)
+{
+ bbsink_server *mysink = (bbsink_server *) sink;
+ char *tmp_filename;
+ char *filename;
+
+ /* We're done with this file now. */
+ FileClose(mysink->file);
+ mysink->file = 0;
+
+ /*
+ * Rename it into place. This also fsyncs the temporary file, so we don't
+ * need to do that here. We don't use data_sync_elevel here for the same
+ * reasons as in bbsink_server_end_archive.
+ */
+ tmp_filename = psprintf("%s/backup_manifest.tmp", mysink->pathname);
+ filename = psprintf("%s/backup_manifest", mysink->pathname);
+ durable_rename(tmp_filename, filename, ERROR);
+ pfree(filename);
+ pfree(tmp_filename);
+
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 0f5f18f02e..021b83de7a 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -522,6 +522,12 @@ pgstat_get_wait_io(WaitEventIO w)
case WAIT_EVENT_BASEBACKUP_READ:
event_name = "BaseBackupRead";
break;
+ case WAIT_EVENT_BASEBACKUP_SYNC:
+ event_name = "BaseBackupSync";
+ break;
+ case WAIT_EVENT_BASEBACKUP_WRITE:
+ event_name = "BaseBackupWrite";
+ break;
case WAIT_EVENT_BUFFILE_READ:
event_name = "BufFileRead";
break;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2a58be638a..ec3b4f3c17 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -115,7 +115,7 @@ typedef enum
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
static char *xlog_dir = NULL;
-static char format = 'p'; /* p(lain)/t(ar) */
+static char format = '\0'; /* p(lain)/t(ar) */
static char *label = "pg_basebackup base backup";
static bool noclean = false;
static bool checksum_failure = false;
@@ -132,6 +132,7 @@ static pg_time_t last_progress_report = 0;
static int32 maxrate = 0; /* no limit by default */
static char *replication_slot = NULL;
static bool temp_replication_slot = true;
+static char *backup_target = NULL;
static bool create_slot = false;
static bool no_slot = false;
static bool verify_checksums = true;
@@ -364,6 +365,8 @@ usage(void)
printf(_("Usage:\n"));
printf(_(" %s [OPTION]...\n"), progname);
printf(_("\nOptions controlling the output:\n"));
+ printf(_(" -t, --target=TARGET[:DETAIL]\n"
+ " backup target (if other than client)\n"));
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
@@ -1232,15 +1235,22 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
}
/*
- * Create an appropriate backup streamer. We know that
- * recovery GUCs are supported, because this protocol can only
- * be used on v15+.
+ * Create an appropriate backup streamer, unless a backup
+ * target was specified. In that case, it's up to the server
+ * to put the backup wherever it needs to go.
*/
- state->streamer =
- CreateBackupStreamer(archive_name,
- spclocation,
- &state->manifest_inject_streamer,
- true, false);
+ if (backup_target == NULL)
+ {
+ /*
+ * We know that recovery GUCs are supported, because this
+ * protocol can only be used on v15+.
+ */
+ state->streamer =
+ CreateBackupStreamer(archive_name,
+ spclocation,
+ &state->manifest_inject_streamer,
+ true, false);
+ }
break;
}
@@ -1312,24 +1322,32 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
GetCopyDataEnd(r, copybuf, cursor);
/*
- * If we're supposed inject the manifest into the archive, we
- * prepare to buffer it in memory; otherwise, we prepare to
- * write it to a temporary file.
+ * If a backup target was specified, figuring out where to put
+ * the manifest is the server's problem. Otherwise, we need to
+ * deal with it.
*/
- if (state->manifest_inject_streamer != NULL)
- state->manifest_buffer = createPQExpBuffer();
- else
+ if (backup_target == NULL)
{
- snprintf(state->manifest_filename,
- sizeof(state->manifest_filename),
- "%s/backup_manifest.tmp", basedir);
- state->manifest_file =
- fopen(state->manifest_filename, "wb");
- if (state->manifest_file == NULL)
+ /*
+ * If we're supposed inject the manifest into the archive,
+ * we prepare to buffer it in memory; otherwise, we
+ * prepare to write it to a temporary file.
+ */
+ if (state->manifest_inject_streamer != NULL)
+ state->manifest_buffer = createPQExpBuffer();
+ else
{
- pg_log_error("could not create file \"%s\": %m",
- state->manifest_filename);
- exit(1);
+ snprintf(state->manifest_filename,
+ sizeof(state->manifest_filename),
+ "%s/backup_manifest.tmp", basedir);
+ state->manifest_file =
+ fopen(state->manifest_filename, "wb");
+ if (state->manifest_file == NULL)
+ {
+ pg_log_error("could not create file \"%s\": %m",
+ state->manifest_filename);
+ exit(1);
+ }
}
}
break;
@@ -1698,13 +1716,41 @@ BaseBackup(void)
if (manifest)
{
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
- manifest_force_encode ? "force-encode" : "yes");
+ manifest_force_encode ? "force-encode" : "yes");
if (manifest_checksums != NULL)
AppendStringCommandOption(&buf, use_new_option_syntax,
- "MANIFEST_CHECKSUMS", manifest_checksums);
+ "MANIFEST_CHECKSUMS", manifest_checksums);
}
- if (serverMajor >= 1500)
+ if (backup_target != NULL)
+ {
+ char *colon;
+
+ if (serverMajor < 1500)
+ {
+ pg_log_error("backup targets are not supported by this server version");
+ exit(1);
+ }
+
+ AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
+
+ if ((colon = strchr(backup_target, ':')) == NULL)
+ {
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", backup_target);
+ }
+ else
+ {
+ char *target;
+
+ target = pnstrdup(backup_target, colon - backup_target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET", target);
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "TARGET_DETAIL", colon + 1);
+ }
+ }
+ else if (serverMajor >= 1500)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
@@ -1799,8 +1845,13 @@ BaseBackup(void)
* Verify tablespace directories are empty. Don't bother with the
* first once since it can be relocated, and it will be checked before
* we do anything anyway.
+ *
+ * Note that this is skipped for tar format backups and backups that
+ * the server is storing to a target location, since in that case
+ * we won't be storing anything into these directories and thus should
+ * not create them.
*/
- if (format == 'p' && !PQgetisnull(res, i, 1))
+ if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
{
char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1)));
@@ -1811,7 +1862,8 @@ BaseBackup(void)
/*
* When writing to stdout, require a single tablespace
*/
- writing_to_stdout = format == 't' && strcmp(basedir, "-") == 0;
+ writing_to_stdout = format == 't' && basedir != NULL &&
+ strcmp(basedir, "-") == 0;
if (writing_to_stdout && PQntuples(res) > 1)
{
pg_log_error("can only write single tablespace to stdout, database has %d",
@@ -1894,7 +1946,7 @@ BaseBackup(void)
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
- pg_log_error("could not get write-ahead log end position from server: %s",
+ pg_log_error("backup failed: %s",
PQerrorMessage(conn));
exit(1);
}
@@ -2028,8 +2080,11 @@ BaseBackup(void)
* synced after being completed. In plain format, all the data of the
* base directory is synced, taking into account all the tablespaces.
* Errors are not considered fatal.
+ *
+ * If, however, there's a backup target, we're not writing anything
+ * locally, so in that case we skip this step.
*/
- if (do_sync)
+ if (do_sync && backup_target == NULL)
{
if (verbose)
pg_log_info("syncing data to disk ...");
@@ -2051,7 +2106,7 @@ BaseBackup(void)
* without a backup_manifest file, decreasing the chances that a directory
* we leave behind will be mistaken for a valid backup.
*/
- if (!writing_to_stdout && manifest)
+ if (!writing_to_stdout && manifest && backup_target == NULL)
{
char tmp_filename[MAXPGPATH];
char filename[MAXPGPATH];
@@ -2085,6 +2140,7 @@ main(int argc, char **argv)
{"max-rate", required_argument, NULL, 'r'},
{"write-recovery-conf", no_argument, NULL, 'R'},
{"slot", required_argument, NULL, 'S'},
+ {"target", required_argument, NULL, 't'},
{"tablespace-mapping", required_argument, NULL, 'T'},
{"wal-method", required_argument, NULL, 'X'},
{"gzip", no_argument, NULL, 'z'},
@@ -2135,7 +2191,7 @@ main(int argc, char **argv)
atexit(cleanup_directories_atexit);
- while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+ while ((c = getopt_long(argc, argv, "CD:F:r:RS:t:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
long_options, &option_index)) != -1)
{
switch (c)
@@ -2176,6 +2232,9 @@ main(int argc, char **argv)
case 2:
no_slot = true;
break;
+ case 't':
+ backup_target = pg_strdup(optarg);
+ break;
case 'T':
tablespace_list_append(optarg);
break;
@@ -2308,27 +2367,72 @@ main(int argc, char **argv)
}
/*
- * Required arguments
+ * Setting the backup target to 'client' is equivalent to leaving out the
+ * option. This logic allows us to assume elsewhere that the backup is
+ * being stored locally if and only if backup_target == NULL.
+ */
+ if (backup_target != NULL && strcmp(backup_target, "client") == 0)
+ {
+ pg_free(backup_target);
+ backup_target = NULL;
+ }
+
+ /*
+ * Can't use --format with --target. Without --target, default format is
+ * tar.
+ */
+ if (backup_target != NULL && format != '\0')
+ {
+ pg_log_error("cannot specify both format and backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (format == '\0')
+ format = 'p';
+
+ /*
+ * Either directory or backup target should be specified, but not both
*/
- if (basedir == NULL)
+ if (basedir == NULL && backup_target == NULL)
{
- pg_log_error("no target directory specified");
+ pg_log_error("must specify output directory or backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ if (basedir != NULL && backup_target != NULL)
+ {
+ pg_log_error("cannot specify both output directory and backup target");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
/*
- * Mutually exclusive arguments
+ * Compression doesn't make sense unless tar format is in use.
*/
if (format == 'p' && compresslevel != 0)
{
- pg_log_error("only tar mode backups can be compressed");
+ if (backup_target == NULL)
+ pg_log_error("only tar mode backups can be compressed");
+ else
+ pg_log_error("client-side compression is not possible when a backup target is specfied");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
+ /*
+ * Sanity checks for WAL method.
+ */
+ if (backup_target != NULL && includewal == STREAM_WAL)
+ {
+ pg_log_error("WAL cannot be streamed when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
{
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
@@ -2345,6 +2449,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for replication slot options.
+ */
if (no_slot)
{
if (replication_slot)
@@ -2378,8 +2485,18 @@ main(int argc, char **argv)
}
}
+ /*
+ * Sanity checks on WAL directory.
+ */
if (xlog_dir)
{
+ if (backup_target != NULL)
+ {
+ pg_log_error("WAL directory location cannot be specified along with a backup target");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
if (format != 'p')
{
pg_log_error("WAL directory location can only be specified in plain mode");
@@ -2400,6 +2517,7 @@ main(int argc, char **argv)
}
#ifndef HAVE_LIBZ
+ /* Sanity checks for compression level. */
if (compresslevel != 0)
{
pg_log_error("this build does not support compression");
@@ -2407,6 +2525,9 @@ main(int argc, char **argv)
}
#endif
+ /*
+ * Sanity checks for progress reporting options.
+ */
if (showprogress && !estimatesize)
{
pg_log_error("%s and %s are incompatible options",
@@ -2416,6 +2537,9 @@ main(int argc, char **argv)
exit(1);
}
+ /*
+ * Sanity checks for backup manifest options.
+ */
if (!manifest && manifest_checksums != NULL)
{
pg_log_error("%s and %s are incompatible options",
@@ -2458,11 +2582,11 @@ main(int argc, char **argv)
manifest = false;
/*
- * Verify that the target directory exists, or create it. For plaintext
- * backups, always require the directory. For tar backups, require it
- * unless we are writing to stdout.
+ * If an output directory was specified, verify that it exists, or create
+ * it. Note that for a tar backup, an output directory of "-" means we are
+ * writing to stdout, so do nothing in that case.
*/
- if (format == 'p' || strcmp(basedir, "-") != 0)
+ if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
/* determine remote server's xlog segment size */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index f0243f28d4..f7e21941eb 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@ use File::Path qw(rmtree);
use Fcntl qw(:seek);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 115;
+use Test::More tests => 135;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -474,6 +474,68 @@ $node->command_ok(
],
'pg_basebackup -X stream runs with --no-slot');
rmtree("$tempdir/backupnoslot");
+$node->command_ok(
+ [ @pg_basebackup_defs, '-D', "$tempdir/backupxf", '-X', 'fetch' ],
+ 'pg_basebackup -X fetch runs');
+
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole' ],
+ qr/WAL cannot be streamed when a backup target is specified/,
+ 'backup target requires -X');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'stream' ],
+ qr/WAL cannot be streamed when a backup target is specified/,
+ 'backup target requires -X other than -X stream');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'bogus', '-X', 'none' ],
+ qr/unrecognized target/,
+ 'backup target unrecognized');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'none', '-D', "$tempdir/blackhole" ],
+ qr/cannot specify both output directory and backup target/,
+ 'backup target and output directory');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'none', '-Ft' ],
+ qr/cannot specify both format and backup target/,
+ 'backup target and output directory');
+$node->command_ok(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-X', 'none' ],
+ 'backup target blackhole');
+$node->command_ok(
+ [ @pg_basebackup_defs, '--target', "server:$tempdir/backuponserver", '-X', 'none' ],
+ 'backup target server');
+ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created');
+rmtree("$tempdir/backuponserver");
+
+$node->command_fails(
+ [
+ @pg_basebackup_defs, '-D',
+ "$tempdir/backupxs_sl_fail", '-X',
+ 'stream', '-S',
+ 'slot0'
+ ],
+ 'pg_basebackup fails with nonexistent replication slot');
+
+$node->command_fails(
+ [ @pg_basebackup_defs, '-D', "$tempdir/backupxs_slot", '-C' ],
+ 'pg_basebackup -C fails without slot name');
+
+$node->command_fails(
+ [
+ @pg_basebackup_defs, '-D',
+ "$tempdir/backupxs_slot", '-C',
+ '-S', 'slot0',
+ '--no-slot'
+ ],
+ 'pg_basebackup fails with -C -S --no-slot');
+$node->command_fails_like(
+ [ @pg_basebackup_defs, '--target', 'blackhole', '-D', "$tempdir/blackhole" ],
+ qr/cannot specify both output directory and backup target/,
+ 'backup target and output directory');
+
+$node->command_ok(
+ [ @pg_basebackup_defs, '-D', "$tempdir/backuptr/co", '-X', 'none' ],
+ 'pg_basebackup -X fetch runs');
$node->command_fails(
[
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 25436defa8..4acadf406d 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -282,9 +282,10 @@ extern void bbsink_forward_end_backup(bbsink *sink, XLogRecPtr endptr,
extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
-extern bbsink *bbsink_copystream_new(void);
+extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
+extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
/* Extra interface functions for progress reporting. */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index e0b2f56f47..395d325c5f 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -157,6 +157,8 @@ typedef enum
typedef enum
{
WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+ WAIT_EVENT_BASEBACKUP_SYNC,
+ WAIT_EVENT_BASEBACKUP_WRITE,
WAIT_EVENT_BUFFILE_READ,
WAIT_EVENT_BUFFILE_WRITE,
WAIT_EVENT_BUFFILE_TRUNCATE,
--
2.24.3 (Apple Git-128)
Hi,
Thanks for the feedback, I have incorporated the suggestions and
updated a new patch v2.
I spent some time thinking about test coverage for the server-side
backup code today and came up with the attached (v12-0003). It does an
end-to-end test that exercises server-side backup and server-side
compression and then untars the backup and validity-checks it using
pg_verifybackup. In addition to being good test coverage for these
patches, it also plugs a gap in the test coverage of pg_verifybackup,
which currently has no test case that untars a tar-format backup and
then verifies the result. I couldn't figure out a way to do that back
at the time I was working on pg_verifybackup, because I didn't think
we had any existing precedent for using 'tar' from a TAP test. But it
was pointed out to me that we do, so I used that as the model for this
test. It should be easy to generalize this test case to test lz4 and
zstd as well, I think. But I guess we'll still need something
different to test what your patch is doing.
I tried to add the test coverage for server side gzip compression with
plain format backup using pg_verifybackup. I have modified the test
to use a flag specific to plain format. If this flag is set then it takes a
plain format backup (with server compression enabled) and verifies
this using pg_verifybackup. I have updated (v2-0002) for the test
coverage.
It's going to need some documentation changes, too.
yes, I am working on it.
Note: Before applying the patches, please apply Robert's v12 version
of the patches 0001, 0002 and 0003.
Thanks,
Dipesh
Attachments:
v2-0001-Support-for-extracting-gzip-compressed-archive.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Support-for-extracting-gzip-compressed-archive.patchDownload
From 826a1cbb639afb7e10a20955d3ec64b1bab1fa80 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Thu, 20 Jan 2022 16:38:36 +0530
Subject: [PATCH 1/2] Support for extracting gzip compressed archive
pg_basebackup can support server side compression using gzip. In
order to support plain format backup with option '-Fp' we need to
add support for decompressing the compressed blocks at client. This
patch addresses the extraction of gzip compressed blocks at client.
---
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_file.c | 182 ---------------
src/bin/pg_basebackup/bbstreamer_gzip.c | 377 ++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 43 +++-
5 files changed, 419 insertions(+), 185 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_gzip.c
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 5b18851..78d96c6 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -38,6 +38,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
bbstreamer_file.o \
+ bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_tar.o
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50..270b0df 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
+extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca222..d721f87 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -11,10 +11,6 @@
#include "postgres_fe.h"
-#ifdef HAVE_LIBZ
-#include <zlib.h>
-#endif
-
#include <unistd.h>
#include "bbstreamer.h"
@@ -30,15 +26,6 @@ typedef struct bbstreamer_plain_writer
bool should_close_file;
} bbstreamer_plain_writer;
-#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
-{
- bbstreamer base;
- char *pathname;
- gzFile gzfile;
-} bbstreamer_gzip_writer;
-#endif
-
typedef struct bbstreamer_extractor
{
bbstreamer base;
@@ -62,22 +49,6 @@ const bbstreamer_ops bbstreamer_plain_writer_ops = {
.free = bbstreamer_plain_writer_free
};
-#ifdef HAVE_LIBZ
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
-static const char *get_gz_error(gzFile gzf);
-
-const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
-};
-#endif
-
static void bbstreamer_extractor_content(bbstreamer *streamer,
bbstreamer_member *member,
const char *data, int len,
@@ -196,159 +167,6 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
- * it to a file.
- *
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
- * for error reporting purposes; if file is NULL, it is also the opened and
- * closed so that the data may be written there.
- */
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
-{
-#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
-
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
-
- streamer->pathname = pstrdup(pathname);
-
- if (file == NULL)
- {
- streamer->gzfile = gzopen(pathname, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not create compressed file \"%s\": %m",
- pathname);
- exit(1);
- }
- }
- else
- {
- int fd = dup(fileno(file));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- streamer->gzfile = gzdopen(fd, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
- }
-
- if (gzsetparams(streamer->gzfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
- exit(1);
- }
-
- return &streamer->base;
-#else
- pg_log_error("this build does not support compression");
- exit(1);
-#endif
-}
-
-#ifdef HAVE_LIBZ
-/*
- * Write archive content to gzip file.
- */
-static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- if (len == 0)
- return;
-
- errno = 0;
- if (gzwrite(mystreamer->gzfile, data, len) != len)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- mystreamer->pathname, get_gz_error(mystreamer->gzfile));
- exit(1);
- }
-}
-
-/*
- * End-of-archive processing when writing to a gzip file consists of just
- * calling gzclose.
- *
- * It makes no difference whether we opened the file or the caller did it,
- * because libz provides no way of avoiding a close on the underling file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
- * work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
- */
-static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- errno = 0; /* in case gzclose() doesn't set it */
- if (gzclose(mystreamer->gzfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %m",
- mystreamer->pathname);
- exit(1);
- }
-
- mystreamer->gzfile = NULL;
-}
-
-/*
- * Free memory associated with this bbstreamer.
- */
-static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- Assert(mystreamer->base.bbs_next == NULL);
- Assert(mystreamer->gzfile == NULL);
-
- pfree(mystreamer->pathname);
- pfree(mystreamer);
-}
-
-/*
- * Helper function for libz error reporting.
- */
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
-/*
* Create a bbstreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
new file mode 100644
index 0000000..c144a73
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -0,0 +1,377 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_gzip.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_gzip.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+
+typedef struct bbstreamer_gzip_extractor
+{
+ bbstreamer base;
+ z_stream zstream;
+ size_t bytes_written;
+} bbstreamer_gzip_extractor;
+
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+
+static void bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_extractor_free(bbstreamer *streamer);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbstreamer_ops bbstreamer_gzip_extractor_ops = {
+ .content = bbstreamer_gzip_extractor_content,
+ .finalize = bbstreamer_gzip_extractor_finalize,
+ .free = bbstreamer_gzip_extractor_free
+};
+#endif
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ errno = 0; /* in case gzclose() doesn't set it */
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of gzip
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_gzip_extractor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_extractor *streamer;
+ z_stream *zs;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_extractor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ zs = &streamer->zstream;
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) streamer->base.bbs_buffer.data;
+ zs->avail_out = streamer->base.bbs_buffer.maxlen;
+
+ /*
+ * Data compression was initialized using deflateInit2 to request a gzip
+ * header. Similarly, we are using inflateInit2 to initialize data
+ * decompression.
+ *
+ * Per the documentation of inflateInit2, the second argument is
+ * "windowBits" and it's value must be greater than or equal to the value
+ * provided while compressing the data, so we are using the maximum
+ * possible value for safety.
+ */
+ if (inflateInit2(zs, 15 + 16) != Z_OK)
+ {
+ pg_log_error("could not initialize compression library");
+ exit(1);
+
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Decompress the input data to output buffer until we ran out of the input
+ * data. Each time the output buffer is full invoke bbstreamer_content to pass
+ * on the decompressed data to next streamer.
+ */
+static void
+bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+ z_stream *zs = &mystreamer->zstream;
+
+
+ zs->next_in = (uint8 *) data;
+ zs->avail_in = len;
+
+ /* Process the current chunk */
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ Assert(mystreamer->bytes_written < mystreamer->base.bbs_buffer.maxlen);
+
+ zs->next_out = (uint8 *)
+ mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ zs->avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * Decompresses data starting at zs->next_in and update zs->next_in
+ * and zs->avail_in, generate output data starting at zs->next_out
+ * and update zs->next_out and zs->avail_out accordingly.
+ */
+ res = inflate(zs, Z_NO_FLUSH);
+
+ if (res == Z_STREAM_ERROR)
+ pg_log_error("could not decompress data: %s", zs->msg);
+
+ mystreamer->bytes_written = mystreamer->base.bbs_buffer.maxlen - zs->avail_out;
+
+ /* If output buffer is full then pass on the content to next streamer */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
+ mystreamer->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_gzip_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_gzip_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ bbstreamer_free(mystreamer->base.bbs_next);
+ pfree(mystreamer->base.bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 6ee49a5..d43eb4b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -111,6 +111,12 @@ typedef enum
STREAM_WAL
} IncludeWal;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} compression_type;
+
/* Global options */
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
@@ -173,6 +179,10 @@ static int has_xlogendptr = 0;
static volatile LONG has_xlogendptr = 0;
#endif
+/* Server side compression method and compression level */
+static compression_type server_compression_type = BACKUP_COMPRESSION_NONE;
+static int server_compression_level = 0;
+
/* Contents of configuration file to be generated */
static PQExpBuffer recoveryconfcontents = NULL;
@@ -1002,7 +1012,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
- bool is_tar;
+ bool is_tar,
+ is_tar_gz;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1017,6 +1028,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar = (archive_name_len > 4 &&
strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+ /* Is this a gzip archive? */
+ is_tar_gz = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1025,8 +1040,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
- /* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar)
+ /* At present, we only know how to parse tar and gzip archives. */
+ if (must_parse_archive && !is_tar && !is_tar_gz)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1136,6 +1151,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
+ /*
+ * Extract the gzip compressed archive using a gzip extractor and then
+ * forward it to next streamer.
+ */
+ if (format == 'p' && server_compression_type == BACKUP_COMPRESSION_GZIP)
+ streamer = bbstreamer_gzip_extractor_new(streamer);
+
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
return streamer;
@@ -2448,6 +2470,21 @@ main(int argc, char **argv)
exit(1);
}
+ if (server_compression != NULL)
+ {
+ if (strcmp(server_compression, "gzip") == 0)
+ server_compression_type = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(server_compression) == 5 &&
+ strncmp(server_compression, "gzip", 4) == 0 &&
+ server_compression[4] >= '1' && server_compression[4] <= '9')
+ {
+ server_compression_type = BACKUP_COMPRESSION_GZIP;
+ server_compression_level = server_compression[4] - '0';
+ }
+ }
+ else
+ server_compression_type = BACKUP_COMPRESSION_NONE;
+
/*
* Compression doesn't make sense unless tar format is in use.
*/
--
1.8.3.1
v2-0002-Test-plain-format-server-compressed-gzip-backup.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Test-plain-format-server-compressed-gzip-backup.patchDownload
From b54f40721fedb566cd212061fd2a10fe50c31a5a Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Thu, 20 Jan 2022 17:44:52 +0530
Subject: [PATCH 2/2] Test plain format server compressed gzip backup
---
src/bin/pg_verifybackup/t/008_untar.pl | 111 ++++++++++++++++++++++-----------
1 file changed, 74 insertions(+), 37 deletions(-)
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index 85946cf..0885c5c
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 6;
+use Test::More tests => 10;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -35,6 +35,12 @@ my @test_configuration = (
'decompress_program' => $ENV{'GZIP_PROGRAM'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--server-compress', 'gzip', '-Fp'],
+ 'plain_format' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1"),
}
);
@@ -51,54 +57,85 @@ for my $tc (@test_configuration)
# Take a server-side backup.
my @backup = (
- 'pg_basebackup', '--no-sync', '-cfast', '--target',
- "server:$backup_path", '-Xfetch'
+ 'pg_basebackup', '--no-sync', '-cfast', '-Xfetch'
);
+
+ if (! $tc->{'plain_format'})
+ {
+ push @backup, '--target', "server:$backup_path";
+ }
+ else
+ {
+ # Target cannot be used with plain format backup.
+ push @backup, '-D', "$backup_path";
+
+ # Make sure that backup directory is empty.
+ rmtree($backup_path);
+ }
+
push @backup, @{$tc->{'backup_flags'}};
$primary->command_ok(\@backup,
"server side backup, compression $method");
- # Verify that the we got the files we expected.
- my $backup_files = join(',',
- sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
- my $expected_backup_files = join(',',
- sort ('backup_manifest', $tc->{'backup_archive'}));
- is($backup_files,$expected_backup_files,
- "found expected backup files, compression $method");
-
- # Decompress.
- if (exists $tc->{'decompress_program'})
+ if (! $tc->{'plain_format'})
{
- my @decompress = ($tc->{'decompress_program'});
- push @decompress, @{$tc->{'decompress_flags'}}
- if $tc->{'decompress_flags'};
- push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
- system_or_bail(@decompress);
- }
+ # Verify that the we got the files we expected.
+ my $backup_files = join(',',
+ sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
+ my $expected_backup_files = join(',',
+ sort ('backup_manifest', $tc->{'backup_archive'}));
+ is($backup_files,$expected_backup_files,
+ "found expected backup files, compression $method");
+
+ # Decompress.
+ if (exists $tc->{'decompress_program'})
+ {
+ my @decompress = ($tc->{'decompress_program'});
+ push @decompress, @{$tc->{'decompress_flags'}}
+ if $tc->{'decompress_flags'};
+ push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ system_or_bail(@decompress);
+ }
+
+ SKIP: {
+ my $tar = $ENV{TAR};
+ # don't check for a working tar here, to accommodate various odd
+ # cases such as AIX. If tar doesn't work the init_from_backup below
+ # will fail.
+ skip "no tar program available", 1
+ if (!defined $tar || $tar eq '');
- SKIP: {
- my $tar = $ENV{TAR};
- # don't check for a working tar here, to accomodate various odd
- # cases such as AIX. If tar doesn't work the init_from_backup below
- # will fail.
- skip "no tar program available", 1
- if (!defined $tar || $tar eq '');
+ # Untar.
+ mkdir($extract_path);
+ system_or_bail($tar, 'xf', $backup_path . '/base.tar',
+ '-C', $extract_path);
- # Untar.
- mkdir($extract_path);
- system_or_bail($tar, 'xf', $backup_path . '/base.tar',
- '-C', $extract_path);
+ # Verify.
+ $primary->command_ok([ 'pg_verifybackup', '-n',
+ '-m', "$backup_path/backup_manifest", '-e', $extract_path ],
+ "verify backup, compression $method");
+ }
- # Verify.
+ # Cleanup.
+ unlink($backup_path . '/backup_manifest');
+ unlink($backup_path . '/base.tar');
+ rmtree($extract_path);
+ }
+ else
+ {
+ # Verify that the we got the files we expected.
+ ok (-f "$backup_path/PG_VERSION", "backup with plain format created");
+ ok (-f "$backup_path/backup_manifest", "backup manifest included");
+
+ # Verify plain format backup with server compression
$primary->command_ok([ 'pg_verifybackup', '-n',
- '-m', "$backup_path/backup_manifest", '-e', $extract_path ],
- "verify backup, compression $method");
+ '-m', "$backup_path/backup_manifest", '-e', $backup_path ],
+ "verify plain format backup, compression $method");
+
+ # Cleanup.
+ rmtree($backup_path);
}
- # Cleanup.
- unlink($backup_path . '/backup_manifest');
- unlink($backup_path . '/base.tar');
- rmtree($extract_path);
}
}
--
1.8.3.1
On Thu, Jan 20, 2022 at 8:00 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Thanks for the feedback, I have incorporated the suggestions and
updated a new patch v2.
Cool. I'll do a detailed review later, but I think this is going in a
good direction.
I tried to add the test coverage for server side gzip compression with
plain format backup using pg_verifybackup. I have modified the test
to use a flag specific to plain format. If this flag is set then it takes a
plain format backup (with server compression enabled) and verifies
this using pg_verifybackup. I have updated (v2-0002) for the test
coverage.
Interesting approach. This unfortunately has the effect of making that
test case file look a bit incoherent -- the comment at the top of the
file isn't really accurate any more, for example, and the plain_format
flag does more than just cause us to use -Fp; it also causes us NOT to
use --target server:X. However, that might be something we can figure
out a way to clean up. Alternatively, we could have a new test case
file that is structured like 002_algorithm.pl but looping over
compression methods rather than checksum algorithms, and testing each
one with --server-compress and -Fp. It might be easier to make that
look nice (but I'm not 100% sure).
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Jan 19, 2022 at 4:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
I spent some time thinking about test coverage for the server-side
backup code today and came up with the attached (v12-0003).
I committed the base backup target patch yesterday, and today I
updated the remaining code in light of Michael Paquier's commit
5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.
Michael, I am proposing to that we remove this message as part of this commit:
- pg_log_info("no value specified for compression
level, switching to default");
I think most people won't want to specify a compression level, so
emitting a message when they don't seems too verbose.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v13-0001-Server-side-gzip-compression.patchapplication/octet-stream; name=v13-0001-Server-side-gzip-compression.patchDownload
From b5c10db6e3eaf62ff18ad7462a26b9916b554307 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 21 Jan 2022 13:29:28 -0500
Subject: [PATCH v13] Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
---
doc/src/sgml/protocol.sgml | 22 ++
doc/src/sgml/ref/pg_basebackup.sgml | 29 +-
src/backend/Makefile | 2 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 54 ++++
src/backend/replication/basebackup_gzip.c | 309 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 136 ++++++++--
src/bin/pg_verifybackup/Makefile | 7 +
src/bin/pg_verifybackup/t/008_untar.pl | 104 ++++++++
src/include/replication/basebackup_sink.h | 1 +
10 files changed, 641 insertions(+), 24 deletions(-)
create mode 100644 src/backend/replication/basebackup_gzip.c
create mode 100644 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index cd6dca691e..2d63e0132c 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2719,6 +2719,28 @@ The commands accepted in replication mode are:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>COMPRESSION</literal> <replaceable>'method'</replaceable></term>
+ <listitem>
+ <para>
+ Instructs the server to compress the backup using the specified
+ method. Currently, the only supported method is
+ <literal>gzip</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term>
+ <listitem>
+ <para>
+ Specifies the compression level to be used. This should only be
+ used in conjunction with the <literal>COMPRESSION</literal> option.
+ The value should be an integer between 1 and 9.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>MAX_RATE</literal> <replaceable>rate</replaceable></term>
<listitem>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 47d11289be..1d0df346b9 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -400,21 +400,36 @@ PostgreSQL documentation
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
<term><option>-Z <replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[[{<replaceable class="parameter">client|server</replaceable>-}]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<listitem>
<para>
- Enables compression of tar file output, and specifies the
- compression level (0 through 9, 0 being no compression and 9 being best
- compression). Compression is only available when using the tar
- format, and the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames.
+ Requests compression of the backup. If <literal>client</literal> or
+ <literal>server</literal> is included, it specifies where the
+ compression is to be performed. Compressing on the server will reduce
+ transfer bandwidth but will increase server CPU consumption. The
+ default is <literal>client</literal> except when
+ <literal>--target</literal> is used. In that case, the backup is not
+ being sent to the client, so only server compression is sensible.
+ When <literal>-Xstream</literal>, which is the default, is used,
+ server-side compression will not be applied to the WAL. To compress
+ the WAL, use client-side compression, or
+ specify <literal>-Xfetch</literal>.
</para>
<para>
The compression method can be set to either <literal>gzip</literal>
for compression with <application>gzip</application>, or
<literal>none</literal> for no compression. A compression level
can be optionally specified, by appending the level number after a
- colon (<literal>:</literal>).
+ colon (<literal>:</literal>). If no level is specified, the default
+ compression level will be used. If only a level is specified without
+ mentioning an algorithm, <literal>gzip</literal> compression will
+ be used if the level is greater than 0, and no compression will be
+ used if the level is 0.
+ </para>
+ <para>
+ When the tar format is used, the suffix <filename>.gz</filename> will
+ automatically be added to all tar filenames. Compression is not
+ available in plain format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/Makefile b/src/backend/Makefile
index add9560be4..4a02006788 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
# The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
ifeq ($(with_systemd),yes)
LIBS += -lsystemd
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index a8f4757f0c..8ec60ded76 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -18,6 +18,7 @@ OBJS = \
backup_manifest.o \
basebackup.o \
basebackup_copy.o \
+ basebackup_gzip.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d32da51535..10ce2406c0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -61,6 +61,12 @@ typedef enum
BACKUP_TARGET_SERVER
} backup_target_type;
+typedef enum
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP
+} basebackup_compression_type;
+
typedef struct
{
const char *label;
@@ -73,6 +79,8 @@ typedef struct
backup_target_type target;
char *target_detail;
backup_manifest_option manifest;
+ basebackup_compression_type compression;
+ int compression_level;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -707,11 +715,14 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_target = false;
bool o_target_detail = false;
char *target_str = "compat"; /* placate compiler */
+ bool o_compression = false;
+ bool o_compression_level = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
+ opt->compression = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -881,7 +892,41 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_detail = optval;
o_target_detail = true;
}
+ else if (strcmp(defel->defname, "compression") == 0)
+ {
+ char *optval = defGetString(defel);
+
+ if (o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ if (strcmp(optval, "none") == 0)
+ opt->compression = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(optval, "gzip") == 0)
+ opt->compression = BACKUP_COMPRESSION_GZIP;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized compression algorithm: \"%s\"",
+ optval)));
+ o_compression = true;
+ }
+ else if (strcmp(defel->defname, "compression_level") == 0)
+ {
+ if (o_compression_level)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->compression_level = defGetInt32(defel);
+ o_compression_level = true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized base backup option: \"%s\"",
+ defel->defname)));
}
+
if (opt->label == NULL)
opt->label = "base backup";
if (opt->manifest == MANIFEST_OPTION_NO)
@@ -908,6 +953,11 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("target '%s' does not accept a target detail",
target_str)));
}
+
+ if (o_compression_level && !o_compression)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("compression level requires compression")));
}
@@ -975,6 +1025,10 @@ SendBaseBackup(BaseBackupCmd *cmd)
if (opt.maxrate > 0)
sink = bbsink_throttle_new(sink, opt.maxrate);
+ /* Set up server-side compression, if client requested it */
+ if (opt.compression == BACKUP_COMPRESSION_GZIP)
+ sink = bbsink_gzip_new(sink, opt.compression_level);
+
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
new file mode 100644
index 0000000000..1e58382fa0
--- /dev/null
+++ b/src/backend/replication/basebackup_gzip.c
@@ -0,0 +1,309 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_gzip.c
+ * Basebackup sink implementing gzip compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbsink_gzip
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ /* Compressed data stream. */
+ z_stream zstream;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_gzip;
+
+static void bbsink_gzip_begin_backup(bbsink *sink);
+static void bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_gzip_archive_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_gzip_end_archive(bbsink *sink);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbsink_ops bbsink_gzip_ops = {
+ .begin_backup = bbsink_gzip_begin_backup,
+ .begin_archive = bbsink_gzip_begin_archive,
+ .archive_contents = bbsink_gzip_archive_contents,
+ .end_archive = bbsink_gzip_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_gzip_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs gzip compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_gzip_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZ
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression is not supported by this build")));
+#else
+ bbsink_gzip *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 9);
+
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
+ else if (compresslevel < 0 || compresslevel > 9)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("gzip compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_gzip));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZ
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_gzip_begin_backup(bbsink *sink)
+{
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ sink->bbs_buffer = palloc(sink->bbs_buffer_length);
+
+ /*
+ * Since deflate() doesn't require the output buffer to be of any
+ * particular size, we can just make it the same size as the input buffer.
+ */
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state,
+ sink->bbs_buffer_length);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_gzip_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ char *gz_archive_name;
+ z_stream *zs = &mysink->zstream;
+
+ /* Initialize compressor object. */
+ memset(zs, 0, sizeof(z_stream));
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) sink->bbs_next->bbs_buffer;
+ zs->avail_out = sink->bbs_next->bbs_buffer_length;
+
+ /*
+ * We need to use deflateInit2() rather than deflateInit() here so that
+ * we can request a gzip header rather than a zlib header. Otherwise, we
+ * want to supply the same values that would have been used by default
+ * if we had just called deflateInit().
+ *
+ * Per the documentation for deflateInit2, the third argument must be
+ * Z_DEFLATED; the fourth argument is the number of "window bits", by
+ * default 15, but adding 16 gets you a gzip header rather than a zlib
+ * header; the fifth argument controls memory usage, and 8 is the default;
+ * and likewise Z_DEFAULT_STRATEGY is the default for the sixth argument.
+ */
+ if (deflateInit2(zs, mysink->compresslevel, Z_DEFLATED, 15 + 16, 8,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not initialize compression library"));
+
+ /*
+ * Add ".gz" to the archive name. Note that the pg_basebackup -z
+ * produces archives named ".tar.gz" rather than ".tgz", so we match
+ * that here.
+ */
+ gz_archive_name = psprintf("%s.gz", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, gz_archive_name);
+ pfree(gz_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer fills up, invoke the archive_contents()
+ * method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_gzip_end_archive() is invoked.
+ */
+static void
+bbsink_gzip_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* Compress data from input buffer. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = len;
+
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * Try to compress. Note that this will update zs->next_in and
+ * zs->avail_in according to how much input data was consumed, and
+ * zs->next_out and zs->avail_out according to how many output bytes
+ * were produced.
+ *
+ * According to the zlib documentation, Z_STREAM_ERROR should only
+ * occur if we've made a programming error, or if say there's been a
+ * memory clobber; we use elog() rather than Assert() here out of an
+ * abundance of caution.
+ */
+ res = deflate(zs, Z_NO_FLUSH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * If the output buffer is full, it's time for the next sink to
+ * process the contents.
+ */
+ if (mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * There might be some data inside zlib's internal buffers; we need to get
+ * that flushed out and forwarded to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_gzip_end_archive(bbsink *sink)
+{
+ bbsink_gzip *mysink = (bbsink_gzip *) sink;
+ z_stream *zs = &mysink->zstream;
+
+ /* There is no more data available. */
+ zs->next_in = (uint8 *) mysink->base.bbs_buffer;
+ zs->avail_in = 0;
+
+ while (1)
+ {
+ int res;
+
+ /* Write output data into unused portion of output buffer. */
+ Assert(mysink->bytes_written < mysink->base.bbs_next->bbs_buffer_length);
+ zs->next_out = (uint8 *)
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written;
+ zs->avail_out =
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written;
+
+ /*
+ * As bbsink_gzip_archive_contents, but pass Z_FINISH since there
+ * is no more input.
+ */
+ res = deflate(zs, Z_FINISH);
+ if (res == Z_STREAM_ERROR)
+ elog(ERROR, "could not compress data: %s", zs->msg);
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written =
+ mysink->base.bbs_next->bbs_buffer_length - zs->avail_out;
+
+ /*
+ * Apparently we had no data in the output buffer and deflate()
+ * was not able to add any. We must be done.
+ */
+ if (mysink->bytes_written == 0)
+ break;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /* Must also pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_gzip_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d5b0ade10d..6fdd1b9958 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -111,6 +111,16 @@ typedef enum
STREAM_WAL
} IncludeWal;
+/*
+ * Different places to perform compression
+ */
+typedef enum
+{
+ COMPRESS_LOCATION_UNSPECIFIED,
+ COMPRESS_LOCATION_CLIENT,
+ COMPRESS_LOCATION_SERVER
+} CompressionLocation;
+
/* Global options */
static char *basedir = NULL;
static TablespaceList tablespace_dirs = {NULL, NULL};
@@ -124,6 +134,7 @@ static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
static WalCompressionMethod compressmethod = COMPRESSION_NONE;
+static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -544,6 +555,11 @@ LogStreamerMain(logstreamer_param *param)
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
COMPRESSION_NONE, 0,
stream.do_sync);
+ else if (compressloc != COMPRESS_LOCATION_CLIENT)
+ stream.walmethod = CreateWalTarMethod(param->xlog,
+ COMPRESSION_NONE,
+ compresslevel,
+ stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
compressmethod,
@@ -944,7 +960,7 @@ parse_max_rate(char *src)
*/
static void
parse_compress_options(char *src, WalCompressionMethod *methodres,
- int *levelres)
+ CompressionLocation *locationres, int *levelres)
{
char *sep;
int firstlen;
@@ -967,9 +983,25 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
* compression method.
*/
if (pg_strcasecmp(firstpart, "gzip") == 0)
+ {
+ *methodres = COMPRESSION_GZIP;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ }
+ else if (pg_strcasecmp(firstpart, "client-gzip") == 0)
+ {
+ *methodres = COMPRESSION_GZIP;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
+ else if (pg_strcasecmp(firstpart, "server-gzip") == 0)
+ {
*methodres = COMPRESSION_GZIP;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
+ {
*methodres = COMPRESSION_NONE;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ }
else
{
/*
@@ -983,6 +1015,7 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = (*levelres > 0) ?
COMPRESSION_GZIP : COMPRESSION_NONE;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
return;
}
@@ -1075,7 +1108,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
+ bool is_tar;
bool must_parse_archive;
+ int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,13 +1119,32 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
+ /* Is this a tar archive? */
+ is_tar = (archive_name_len > 4 &&
+ strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
+ * However, we only know how to parse tar archives.
*/
must_parse_archive = (format == 'p' || inject_manifest ||
(spclocation == NULL && writerecoveryconf));
+ /* At present, we only know how to parse tar archives. */
+ if (must_parse_archive && !is_tar)
+ {
+ pg_log_error("unable to parse archive: %s", archive_name);
+ pg_log_info("only tar archives can be parsed");
+ if (format == 'p')
+ pg_log_info("plain format requires pg_basebackup to parse the archive");
+ if (inject_manifest)
+ pg_log_info("using - as the output directory requires pg_basebackup to parse the archive");
+ if (writerecoveryconf)
+ pg_log_info("the -R option requires pg_basebackup to parse the archive");
+ exit(1);
+ }
+
if (format == 'p')
{
const char *directory;
@@ -1131,7 +1185,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
- if (compressmethod == COMPRESSION_NONE)
+ if (compressmethod == COMPRESSION_NONE ||
+ compressloc != COMPRESS_LOCATION_CLIENT)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
#ifdef HAVE_LIBZ
@@ -1833,6 +1888,31 @@ BaseBackup(void)
AppendStringCommandOption(&buf, use_new_option_syntax,
"TARGET", "client");
+ if (compressloc == COMPRESS_LOCATION_SERVER)
+ {
+ char *compressmethodstr = NULL;
+
+ if (!use_new_option_syntax)
+ {
+ pg_log_error("server does not support server-side compression");
+ exit(1);
+ }
+ switch (compressmethod)
+ {
+ case COMPRESSION_GZIP:
+ compressmethodstr = "gzip";
+ break;
+ default:
+ Assert(false);
+ break;
+ }
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION", compressmethodstr);
+ if (compresslevel != 0)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_LEVEL", compresslevel);
+ }
+
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
@@ -2359,10 +2439,11 @@ main(int argc, char **argv)
compresslevel = 1; /* will be rejected below */
#endif
compressmethod = COMPRESSION_GZIP;
+ compressloc = COMPRESS_LOCATION_UNSPECIFIED;
break;
case 'Z':
parse_compress_options(optarg, &compressmethod,
- &compresslevel);
+ &compressloc, &compresslevel);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2489,14 +2570,37 @@ main(int argc, char **argv)
}
/*
- * Compression doesn't make sense unless tar format is in use.
+ * If we're compressing the backup and the user has not said where to
+ * perform the compression, do it on the client, unless they specified
+ * --target, in which case the server is the only choice.
*/
- if (format == 'p' && compressmethod != COMPRESSION_NONE)
+ if (compressmethod != COMPRESSION_NONE &&
+ compressloc == COMPRESS_LOCATION_UNSPECIFIED)
{
if (backup_target == NULL)
- pg_log_error("only tar mode backups can be compressed");
+ compressloc = COMPRESS_LOCATION_CLIENT;
else
- pg_log_error("client-side compression is not possible when a backup target is specfied");
+ compressloc = COMPRESS_LOCATION_SERVER;
+ }
+
+ /*
+ * Can't perform client-side compression if the backup is not being
+ * sent to the client.
+ */
+ if (backup_target != NULL && compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ pg_log_error("client-side compression is not possible when a backup target is specified");
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+
+ /*
+ * Compression doesn't make sense unless tar format is in use.
+ */
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
@@ -2609,23 +2713,23 @@ main(int argc, char **argv)
}
break;
case COMPRESSION_GZIP:
-#ifdef HAVE_LIBZ
- if (compresslevel == 0)
- {
- pg_log_info("no value specified for compression level, switching to default");
- compresslevel = Z_DEFAULT_COMPRESSION;
- }
if (compresslevel > 9)
{
pg_log_error("compression level %d of method %s higher than maximum of 9",
compresslevel, "gzip");
exit(1);
}
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+#ifdef HAVE_LIBZ
+ if (compresslevel == 0)
+ compresslevel = Z_DEFAULT_COMPRESSION;
#else
- pg_log_error("this build does not support compression with %s",
- "gzip");
- exit(1);
+ pg_log_error("this build does not support compression with %s",
+ "gzip");
+ exit(1);
#endif
+ }
break;
case COMPRESSION_LZ4:
/* option not supported */
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index c07643b129..1ae818f9a1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -3,6 +3,13 @@
PGFILEDESC = "pg_verifybackup - verify a backup against using a backup manifest"
PGAPPICON = win32
+# make these available to TAP test scripts
+export TAR
+# Note that GZIP cannot be used directly as this environment variable is
+# used by the command "gzip" to pass down options, so stick with a different
+# name.
+export GZIP_PROGRAM=$(GZIP)
+
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
new file mode 100644
index 0000000000..1d74a41886
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -0,0 +1,104 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test case aims to verify that server-side backups and server-side
+# backup compression work properly, and it also aims to verify that
+# pg_verifybackup can verify a base backup that didn't start out in plain
+# format.
+
+use strict;
+use warnings;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 6;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my $have_zlib = check_pg_config("#define HAVE_LIBZ 1");
+my $backup_path = $primary->backup_dir . '/server-backup';
+my $extract_path = $primary->backup_dir . '/extracted-backup';
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'backup_archive' => 'base.tar',
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--compress', 'server-gzip'],
+ 'backup_archive' => 'base.tar.gz',
+ 'decompress_program' => $ENV{'GZIP_PROGRAM'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+ skip "no decompressor available for $method", 3
+ if exists $tc->{'decompress_program'} &&
+ !defined $tc->{'decompress_program'};
+
+ # Take a server-side backup.
+ my @backup = (
+ 'pg_basebackup', '--no-sync', '-cfast', '--target',
+ "server:$backup_path", '-Xfetch'
+ );
+ push @backup, @{$tc->{'backup_flags'}};
+ $primary->command_ok(\@backup,
+ "server side backup, compression $method");
+
+
+ # Verify that the we got the files we expected.
+ my $backup_files = join(',',
+ sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
+ my $expected_backup_files = join(',',
+ sort ('backup_manifest', $tc->{'backup_archive'}));
+ is($backup_files,$expected_backup_files,
+ "found expected backup files, compression $method");
+
+ # Decompress.
+ if (exists $tc->{'decompress_program'})
+ {
+ my @decompress = ($tc->{'decompress_program'});
+ push @decompress, @{$tc->{'decompress_flags'}}
+ if $tc->{'decompress_flags'};
+ push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ system_or_bail(@decompress);
+ }
+
+ SKIP: {
+ my $tar = $ENV{TAR};
+ # don't check for a working tar here, to accomodate various odd
+ # cases such as AIX. If tar doesn't work the init_from_backup below
+ # will fail.
+ skip "no tar program available", 1
+ if (!defined $tar || $tar eq '');
+
+ # Untar.
+ mkdir($extract_path);
+ system_or_bail($tar, 'xf', $backup_path . '/base.tar',
+ '-C', $extract_path);
+
+ # Verify.
+ $primary->command_ok([ 'pg_verifybackup', '-n',
+ '-m', "$backup_path/backup_manifest", '-e', $extract_path ],
+ "verify backup, compression $method");
+ }
+
+ # Cleanup.
+ unlink($backup_path . '/backup_manifest');
+ unlink($backup_path . '/base.tar');
+ rmtree($extract_path);
+ }
+}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 4acadf406d..d3276b2487 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
+extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.24.3 (Apple Git-128)
On Thu, Jan 20, 2022 at 11:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 20, 2022 at 8:00 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Thanks for the feedback, I have incorporated the suggestions and
updated a new patch v2.Cool. I'll do a detailed review later, but I think this is going in a
good direction.
Here is a more detailed review.
+ if (inflateInit2(zs, 15 + 16) != Z_OK)
+ {
+ pg_log_error("could not initialize compression library");
+ exit(1);
+
+ }
Extra blank line.
+ /* At present, we only know how to parse tar and gzip archives. */
gzip -> tar.gz. You can gzip something that is not a tar.
+ * Extract the gzip compressed archive using a gzip extractor and then
+ * forward it to next streamer.
This comment is not good. First, we're not necessarily doing it.
Second, it just describes what the code does, not why it does it.
Maybe something like "If the user requested both that the server
compress the backup and also that we extract the backup, we need to
decompress it."
+ if (server_compression != NULL)
+ {
+ if (strcmp(server_compression, "gzip") == 0)
+ server_compression_type = BACKUP_COMPRESSION_GZIP;
+ else if (strlen(server_compression) == 5 &&
+ strncmp(server_compression, "gzip", 4) == 0 &&
+ server_compression[4] >= '1' && server_compression[4] <= '9')
+ {
+ server_compression_type = BACKUP_COMPRESSION_GZIP;
+ server_compression_level = server_compression[4] - '0';
+ }
+ }
+ else
+ server_compression_type = BACKUP_COMPRESSION_NONE;
I think this is not required any more. I think probably some other
things need to be adjusted as well, based on Michael's changes and the
updates in my patch to match.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
Here is a more detailed review.
Thanks for the feedback, I have incorporated the suggestions
and updated a new version of the patch (v3-0001).
The required documentation changes are also incorporated in
updated patch (v3-0001).
Interesting approach. This unfortunately has the effect of making that
test case file look a bit incoherent -- the comment at the top of the
file isn't really accurate any more, for example, and the plain_format
flag does more than just cause us to use -Fp; it also causes us NOT to
use --target server:X. However, that might be something we can figure
out a way to clean up. Alternatively, we could have a new test case
file that is structured like 002_algorithm.pl but looping over
compression methods rather than checksum algorithms, and testing each
one with --server-compress and -Fp. It might be easier to make that
look nice (but I'm not 100% sure).
Added a new test case file "009_extract.pl" to test server compressed plain
format backup (v3-0002).
I committed the base backup target patch yesterday, and today I
updated the remaining code in light of Michael Paquier's commit
5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.
v13 patch does not apply on the latest head, it requires a rebase. I have
applied
it on commit dc43fc9b3aa3e0fa9c84faddad6d301813580f88 to validate gzip
decompression patches.
Thanks,
Dipesh
Attachments:
v3-0001-Support-for-extracting-gzip-compressed-archive.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Support-for-extracting-gzip-compressed-archive.patchDownload
From 9ec2efcc908e988409cd9ba19ea64a50012163a2 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Mon, 24 Jan 2022 15:28:48 +0530
Subject: [PATCH 1/2] Support for extracting gzip compressed archive
pg_basebackup can support server side compression using gzip. In
order to support plain format backup with option '-Fp' we need to
add support for decompressing the compressed blocks at client. This
patch addresses the extraction of gzip compressed blocks at client.
---
doc/src/sgml/ref/pg_basebackup.sgml | 8 +-
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_file.c | 182 ----------------
src/bin/pg_basebackup/bbstreamer_gzip.c | 376 ++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 19 +-
6 files changed, 401 insertions(+), 186 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df34..19849be 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -428,8 +428,12 @@ PostgreSQL documentation
</para>
<para>
When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. Compression is not
- available in plain format.
+ automatically be added to all tar filenames.
+ </para>
+ <para>
+ Server compression can be specified with plain format backup. It
+ enables compression of the archive at server and extract the
+ compressed archive at client.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 5b18851..78d96c6 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -38,6 +38,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
bbstreamer_file.o \
+ bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_tar.o
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50..270b0df 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
+extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca222..d721f87 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -11,10 +11,6 @@
#include "postgres_fe.h"
-#ifdef HAVE_LIBZ
-#include <zlib.h>
-#endif
-
#include <unistd.h>
#include "bbstreamer.h"
@@ -30,15 +26,6 @@ typedef struct bbstreamer_plain_writer
bool should_close_file;
} bbstreamer_plain_writer;
-#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
-{
- bbstreamer base;
- char *pathname;
- gzFile gzfile;
-} bbstreamer_gzip_writer;
-#endif
-
typedef struct bbstreamer_extractor
{
bbstreamer base;
@@ -62,22 +49,6 @@ const bbstreamer_ops bbstreamer_plain_writer_ops = {
.free = bbstreamer_plain_writer_free
};
-#ifdef HAVE_LIBZ
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
-static const char *get_gz_error(gzFile gzf);
-
-const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
-};
-#endif
-
static void bbstreamer_extractor_content(bbstreamer *streamer,
bbstreamer_member *member,
const char *data, int len,
@@ -196,159 +167,6 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
- * it to a file.
- *
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
- * for error reporting purposes; if file is NULL, it is also the opened and
- * closed so that the data may be written there.
- */
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
-{
-#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
-
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
-
- streamer->pathname = pstrdup(pathname);
-
- if (file == NULL)
- {
- streamer->gzfile = gzopen(pathname, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not create compressed file \"%s\": %m",
- pathname);
- exit(1);
- }
- }
- else
- {
- int fd = dup(fileno(file));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- streamer->gzfile = gzdopen(fd, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
- }
-
- if (gzsetparams(streamer->gzfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
- exit(1);
- }
-
- return &streamer->base;
-#else
- pg_log_error("this build does not support compression");
- exit(1);
-#endif
-}
-
-#ifdef HAVE_LIBZ
-/*
- * Write archive content to gzip file.
- */
-static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- if (len == 0)
- return;
-
- errno = 0;
- if (gzwrite(mystreamer->gzfile, data, len) != len)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- mystreamer->pathname, get_gz_error(mystreamer->gzfile));
- exit(1);
- }
-}
-
-/*
- * End-of-archive processing when writing to a gzip file consists of just
- * calling gzclose.
- *
- * It makes no difference whether we opened the file or the caller did it,
- * because libz provides no way of avoiding a close on the underling file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
- * work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
- */
-static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- errno = 0; /* in case gzclose() doesn't set it */
- if (gzclose(mystreamer->gzfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %m",
- mystreamer->pathname);
- exit(1);
- }
-
- mystreamer->gzfile = NULL;
-}
-
-/*
- * Free memory associated with this bbstreamer.
- */
-static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- Assert(mystreamer->base.bbs_next == NULL);
- Assert(mystreamer->gzfile == NULL);
-
- pfree(mystreamer->pathname);
- pfree(mystreamer);
-}
-
-/*
- * Helper function for libz error reporting.
- */
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
-/*
* Create a bbstreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
new file mode 100644
index 0000000..1144090
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -0,0 +1,376 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_gzip.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_gzip.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+
+typedef struct bbstreamer_gzip_extractor
+{
+ bbstreamer base;
+ z_stream zstream;
+ size_t bytes_written;
+} bbstreamer_gzip_extractor;
+
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+
+static void bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_extractor_free(bbstreamer *streamer);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbstreamer_ops bbstreamer_gzip_extractor_ops = {
+ .content = bbstreamer_gzip_extractor_content,
+ .finalize = bbstreamer_gzip_extractor_finalize,
+ .free = bbstreamer_gzip_extractor_free
+};
+#endif
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ errno = 0; /* in case gzclose() doesn't set it */
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of gzip
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_gzip_extractor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_extractor *streamer;
+ z_stream *zs;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_extractor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ zs = &streamer->zstream;
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) streamer->base.bbs_buffer.data;
+ zs->avail_out = streamer->base.bbs_buffer.maxlen;
+
+ /*
+ * Data compression was initialized using deflateInit2 to request a gzip
+ * header. Similarly, we are using inflateInit2 to initialize data
+ * decompression.
+ *
+ * Per the documentation of inflateInit2, the second argument is
+ * "windowBits" and it's value must be greater than or equal to the value
+ * provided while compressing the data, so we are using the maximum
+ * possible value for safety.
+ */
+ if (inflateInit2(zs, 15 + 16) != Z_OK)
+ {
+ pg_log_error("could not initialize compression library");
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Decompress the input data to output buffer until we ran out of the input
+ * data. Each time the output buffer is full invoke bbstreamer_content to pass
+ * on the decompressed data to next streamer.
+ */
+static void
+bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+ z_stream *zs = &mystreamer->zstream;
+
+
+ zs->next_in = (uint8 *) data;
+ zs->avail_in = len;
+
+ /* Process the current chunk */
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ Assert(mystreamer->bytes_written < mystreamer->base.bbs_buffer.maxlen);
+
+ zs->next_out = (uint8 *)
+ mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ zs->avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * Decompresses data starting at zs->next_in and update zs->next_in
+ * and zs->avail_in, generate output data starting at zs->next_out
+ * and update zs->next_out and zs->avail_out accordingly.
+ */
+ res = inflate(zs, Z_NO_FLUSH);
+
+ if (res == Z_STREAM_ERROR)
+ pg_log_error("could not decompress data: %s", zs->msg);
+
+ mystreamer->bytes_written = mystreamer->base.bbs_buffer.maxlen - zs->avail_out;
+
+ /* If output buffer is full then pass on the content to next streamer */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
+ mystreamer->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_gzip_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_gzip_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ bbstreamer_free(mystreamer->base.bbs_next);
+ pfree(mystreamer->base.bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 6fdd1b9..6b7cff9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1108,7 +1108,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
- bool is_tar;
+ bool is_tar,
+ is_tar_gz;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1123,6 +1124,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar = (archive_name_len > 4 &&
strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+ /* Is this a gzip archive? */
+ is_tar_gz = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1132,7 +1137,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar)
+ if (must_parse_archive && !is_tar && !is_tar_gz)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1246,6 +1251,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
+#ifdef HAVE_LIBZ
+ /*
+ * If the user has requested a server compressed archive along with archive
+ * extraction at client then we need to decompress it.
+ */
+ if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
+ compressloc == COMPRESS_LOCATION_SERVER)
+ streamer = bbstreamer_gzip_extractor_new(streamer);
+#endif
+
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
return streamer;
--
1.8.3.1
v3-0002-Test-plain-format-server-compressed-gzip-backup.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Test-plain-format-server-compressed-gzip-backup.patchDownload
From 534ae0b539fe981361a50e1b2794ff88f466b5ff Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Mon, 24 Jan 2022 18:06:12 +0530
Subject: [PATCH 2/2] Test plain format server compressed gzip backup
---
src/bin/pg_verifybackup/t/009_extract.pl | 66 ++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
create mode 100755 src/bin/pg_verifybackup/t/009_extract.pl
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
new file mode 100755
index 0000000..0eeab46
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -0,0 +1,66 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test aims to verify server compression for plain format backup.
+
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 7;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--compress', 'server-gzip:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $backup_path = $primary->backup_dir . '/' . 'extract_backup';
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+
+ # Take backup with server compression enabled.
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path,
+ '-Xfetch', '--no-sync', '-cfast', '-Fp');
+ push @backup, @{$tc->{'backup_flags'}};
+
+ my @verify = ('pg_verifybackup', '-e', $backup_path);
+
+ # A backup with a valid compression method should work.
+ $primary->command_ok(\@backup, "backup ok with compression method \"$method\"");
+
+ # Verify that backup is extracted
+ if ($method ne 'none')
+ {
+ ok (-f "$backup_path/PG_VERSION", "extracted compressed backup, compression method \"$method\"");
+ }
+ ok(-f "$backup_path/backup_manifest", "backup manifest exists, compression method \"$method\"");
+
+ # Make sure that it verifies OK.
+ $primary->command_ok(\@verify,
+ "verify backup with compression method \"$method\"");
+ }
+
+ # Remove backup immediately to save disk space.
+ rmtree($backup_path);
+}
--
1.8.3.1
On Mon, Jan 24, 2022 at 9:30 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
v13 patch does not apply on the latest head, it requires a rebase. I have applied
it on commit dc43fc9b3aa3e0fa9c84faddad6d301813580f88 to validate gzip
decompression patches.
It only needed trivial rebasing; I have committed it after doing that.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
Thank you for committing a great feature. I have tested the committed features.
The attached small patch fixes the output of the --help message. In the previous commit, only gzip and none were output, but in the attached patch, client-gzip and server-gzip are added.
Regards,
Noriyoshi Shinoda
-----Original Message-----
From: Robert Haas <robertmhaas@gmail.com>
Sent: Saturday, January 22, 2022 3:33 AM
To: Dipesh Pandit <dipesh.pandit@gmail.com>; Michael Paquier <michael@paquier.xyz>
Cc: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>; tushar <tushar.ahuja@enterprisedb.com>; Dmitry Dolgov <9erthalion6@gmail.com>; Mark Dilger <mark.dilger@enterprisedb.com>; pgsql-hackers@postgresql.org
Subject: Re: refactoring basebackup.c
On Wed, Jan 19, 2022 at 4:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
I spent some time thinking about test coverage for the server-side
backup code today and came up with the attached (v12-0003).
I committed the base backup target patch yesterday, and today I updated the remaining code in light of Michael Paquier's commit 5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.
Michael, I am proposing to that we remove this message as part of this commit:
- pg_log_info("no value specified for compression
level, switching to default");
I think most people won't want to specify a compression level, so emitting a message when they don't seems too verbose.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
pg_basebackup_option_v1.diffapplication/octet-stream; name=pg_basebackup_option_v1.diffDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 72c27c7..6703493 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={gzip,client-gzip,server-gzip,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
"Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi.shinoda@hpe.com> writes:
Hi,
Thank you for committing a great feature. I have tested the committed features.
The attached small patch fixes the output of the --help message. In the
previous commit, only gzip and none were output, but in the attached
patch, client-gzip and server-gzip are added.
I think it would be better to write that as `[{client,server}-]gzip`,
especially as we add more compression agorithms, where it would
presumably become `[{client,server}-]METHOD` (assuming all methods are
supported on both the client and server side).
I also noticed that in the docs, the `client` and `server` are marked up
as replaceable parameters, when they are actually literals, plus the
hyphen is misplaced. The `--checkpoint` option also has the `fast` and
`spread` literals marked up as parameters.
All of these are fixed in the attached patch.
- ilmari
Attachments:
0001-pg_basebackup-documentation-and-help-fixes.patchtext/x-diffDownload
From 8e3d191917984a6d17f2c72212d90c96467463b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 25 Jan 2022 13:04:05 +0000
Subject: [PATCH] pg_basebackup documentation and help fixes
Don't mark up literals as replaceable parameters and indicate alternatives
correctly with {...|...}.
---
doc/src/sgml/ref/pg_basebackup.sgml | 6 +++---
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df346b9..98c89751b3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -400,7 +400,7 @@
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
<term><option>-Z <replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=[[{<replaceable class="parameter">client|server</replaceable>-}]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<listitem>
<para>
Requests compression of the backup. If <literal>client</literal> or
@@ -441,8 +441,8 @@
<variablelist>
<varlistentry>
- <term><option>-c <replaceable class="parameter">fast|spread</replaceable></option></term>
- <term><option>--checkpoint=<replaceable class="parameter">fast|spread</replaceable></option></term>
+ <term><option>-c {fast|spread}</option></term>
+ <term><option>--checkpoint={fast|spread}</option></term>
<listitem>
<para>
Sets checkpoint mode to fast (immediate) or spread (the default)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 72c27c78d0..46f6f53e9b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
--
2.30.2
Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:
"Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi.shinoda@hpe.com> writes:
Hi,
Thank you for committing a great feature. I have tested the committed features.
The attached small patch fixes the output of the --help message. In the
previous commit, only gzip and none were output, but in the attached
patch, client-gzip and server-gzip are added.I think it would be better to write that as `[{client,server}-]gzip`,
especially as we add more compression agorithms, where it would
presumably become `[{client,server}-]METHOD` (assuming all methods are
supported on both the client and server side).I also noticed that in the docs, the `client` and `server` are marked up
as replaceable parameters, when they are actually literals, plus the
hyphen is misplaced. The `--checkpoint` option also has the `fast` and
`spread` literals marked up as parameters.All of these are fixed in the attached patch.
I just noticed there was a superfluous [ in the SGM documentation, and
that the short form was missing the [{client|server}-] part. Updated
patch attaced.
- ilmari
Attachments:
v2-0001-pg_basebackup-documentation-and-help-fixes.patchtext/x-diffDownload
From 2164f1a9fc97a5f88f57c7cc9cdafa67398dcc0e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 25 Jan 2022 13:04:05 +0000
Subject: [PATCH v2] pg_basebackup documentation and help fixes
Don't mark up literals as replaceable parameters and indicate alternatives
correctly with {...|...}, and add missing [{client,server}-] to the
-Z form.
---
doc/src/sgml/ref/pg_basebackup.sgml | 8 ++++----
src/bin/pg_basebackup/pg_basebackup.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df346b9..a5e03d2c66 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -398,9 +398,9 @@
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
- <term><option>-Z <replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=[[{<replaceable class="parameter">client|server</replaceable>-}]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
<listitem>
<para>
Requests compression of the backup. If <literal>client</literal> or
@@ -441,8 +441,8 @@
<variablelist>
<varlistentry>
- <term><option>-c <replaceable class="parameter">fast|spread</replaceable></option></term>
- <term><option>--checkpoint=<replaceable class="parameter">fast|spread</replaceable></option></term>
+ <term><option>-c {fast|spread}</option></term>
+ <term><option>--checkpoint={fast|spread}</option></term>
<listitem>
<para>
Sets checkpoint mode to fast (immediate) or spread (the default)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 72c27c78d0..46f6f53e9b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
--
2.30.2
On 1/22/22 12:03 AM, Robert Haas wrote:
I committed the base backup target patch yesterday, and today I
updated the remaining code in light of Michael Paquier's commit
5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.
Thanks Robert, I tested against the latest PG Head and found a few issues -
A)Getting syntax error if -z is used along with -t
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/data902 -z -Xfetch
pg_basebackup: error: could not initiate base backup: ERROR: syntax error
OR
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/t2
--compress=server-gzip:9 -Xfetch -v -z
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: error: could not initiate base backup: ERROR: syntax error
B)No information of "client-gzip" or "server-gzip" added under
"--compress" option/method of ./pg_basebackup --help.
C) -R option is silently ignoring
[edb@centos7tushar bin]$ ./pg_basebackup -Z 4 -v -t server:/tmp/pp
-Xfetch -R
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/30000028 on timeline 1
pg_basebackup: write-ahead log end point: 0/30000100
pg_basebackup: base backup completed
[edb@centos7tushar bin]$
go to /tmp/pp folder and extract it - there is no "standby.signal" file
and if we start cluster against this data directory,it will not be in
slave mode.
if this is not supported then I think we should throw some errors.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Tue, Jan 25, 2022 at 8:42 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
I just noticed there was a superfluous [ in the SGM documentation, and
that the short form was missing the [{client|server}-] part. Updated
patch attaced.
Committed, thanks.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 25, 2022 at 03:54:52AM +0000, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
Michael, I am proposing to that we remove this message as part of
this commit:- pg_log_info("no value specified for compression
level, switching to default");I think most people won't want to specify a compression level, so
emitting a message when they don't seems too verbose.
(Just noticed this message as I am not in CC.)
Removing this message is fine by me, thanks!
--
Michael
On Tue, Jan 25, 2022 at 09:52:12PM +0530, tushar wrote:
C) -R option is silently ignoring
go to /tmp/pp folder and extract it - there is no "standby.signal" file and
if we start cluster against this data directory,it will not be in slave
mode.
Yeah, I don't think it's good to silently ignore the option, and we
should not generate the file on the server-side. Rather than erroring
in this case, you'd better add the file to the existing compressed
file of the base data folder on the client-side.
This makes me wonder whether we should begin tracking any open items
for v15.. We don't want to lose track of any issue with features
committed already in the tree.
--
Michael
On Tue, Jan 25, 2022 at 8:23 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Jan 25, 2022 at 03:54:52AM +0000, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
Michael, I am proposing to that we remove this message as part of
this commit:- pg_log_info("no value specified for compression
level, switching to default");I think most people won't want to specify a compression level, so
emitting a message when they don't seems too verbose.(Just noticed this message as I am not in CC.)
Removing this message is fine by me, thanks!
Oh, I thought I'd CC'd you. I know I meant to do so.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 25, 2022 at 11:22 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
A)Getting syntax error if -z is used along with -t
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/data902 -z -Xfetch
pg_basebackup: error: could not initiate base backup: ERROR: syntax error
Oops. The attached patch should fix this.
B)No information of "client-gzip" or "server-gzip" added under
"--compress" option/method of ./pg_basebackup --help.
Already fixed by e1f860f13459e186479319aa9f65ef184277805f.
C) -R option is silently ignoring
The attached patch should fix this, too.
Thanks for finding these issues.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
fix-mistakes-found-by-tushar.patchapplication/octet-stream; name=fix-mistakes-found-by-tushar.patchDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 46f6f53e9b..851f03ca81 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1871,6 +1871,12 @@ BaseBackup(void)
exit(1);
}
+ if (writerecoveryconf)
+ {
+ pg_log_error("recovery configuration cannot be written when a backup target is used");
+ exit(1);
+ }
+
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
if ((colon = strchr(backup_target, ':')) == NULL)
@@ -1913,7 +1919,7 @@ BaseBackup(void)
}
AppendStringCommandOption(&buf, use_new_option_syntax,
"COMPRESSION", compressmethodstr);
- if (compresslevel != 0)
+ if (compresslevel != 0 && compresslevel != Z_DEFAULT_COMPRESSION)
AppendIntegerCommandOption(&buf, use_new_option_syntax,
"COMPRESSION_LEVEL", compresslevel);
}
Hi,
It only needed trivial rebasing; I have committed it after doing that.
I have updated the patches to support server compression (gzip) for
plain format backup. Please find attached v4 patches.
Thanks,
Dipesh
Attachments:
v4-0001-Support-for-extracting-gzip-compressed-archive.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Support-for-extracting-gzip-compressed-archive.patchDownload
From 4d0c84d6fac841aafb757535cc0e48334a251581 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Mon, 24 Jan 2022 15:28:48 +0530
Subject: [PATCH 1/2] Support for extracting gzip compressed archive
pg_basebackup can support server side compression using gzip. In
order to support plain format backup with option '-Fp' we need to
add support for decompressing the compressed blocks at client. This
patch addresses the extraction of gzip compressed blocks at client.
---
doc/src/sgml/ref/pg_basebackup.sgml | 8 +-
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_file.c | 182 ----------------
src/bin/pg_basebackup/bbstreamer_gzip.c | 376 ++++++++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 19 +-
6 files changed, 401 insertions(+), 186 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_gzip.c
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df34..19849be 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -428,8 +428,12 @@ PostgreSQL documentation
</para>
<para>
When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. Compression is not
- available in plain format.
+ automatically be added to all tar filenames.
+ </para>
+ <para>
+ Server compression can be specified with plain format backup. It
+ enables compression of the archive at server and extract the
+ compressed archive at client.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 5b18851..78d96c6 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -38,6 +38,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
bbstreamer_file.o \
+ bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_tar.o
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50..270b0df 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
+extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca222..d721f87 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -11,10 +11,6 @@
#include "postgres_fe.h"
-#ifdef HAVE_LIBZ
-#include <zlib.h>
-#endif
-
#include <unistd.h>
#include "bbstreamer.h"
@@ -30,15 +26,6 @@ typedef struct bbstreamer_plain_writer
bool should_close_file;
} bbstreamer_plain_writer;
-#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
-{
- bbstreamer base;
- char *pathname;
- gzFile gzfile;
-} bbstreamer_gzip_writer;
-#endif
-
typedef struct bbstreamer_extractor
{
bbstreamer base;
@@ -62,22 +49,6 @@ const bbstreamer_ops bbstreamer_plain_writer_ops = {
.free = bbstreamer_plain_writer_free
};
-#ifdef HAVE_LIBZ
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
-static const char *get_gz_error(gzFile gzf);
-
-const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
-};
-#endif
-
static void bbstreamer_extractor_content(bbstreamer *streamer,
bbstreamer_member *member,
const char *data, int len,
@@ -196,159 +167,6 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
}
/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
- * it to a file.
- *
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
- * for error reporting purposes; if file is NULL, it is also the opened and
- * closed so that the data may be written there.
- */
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
-{
-#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
-
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
-
- streamer->pathname = pstrdup(pathname);
-
- if (file == NULL)
- {
- streamer->gzfile = gzopen(pathname, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not create compressed file \"%s\": %m",
- pathname);
- exit(1);
- }
- }
- else
- {
- int fd = dup(fileno(file));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- streamer->gzfile = gzdopen(fd, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
- }
-
- if (gzsetparams(streamer->gzfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
- exit(1);
- }
-
- return &streamer->base;
-#else
- pg_log_error("this build does not support compression");
- exit(1);
-#endif
-}
-
-#ifdef HAVE_LIBZ
-/*
- * Write archive content to gzip file.
- */
-static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- if (len == 0)
- return;
-
- errno = 0;
- if (gzwrite(mystreamer->gzfile, data, len) != len)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- mystreamer->pathname, get_gz_error(mystreamer->gzfile));
- exit(1);
- }
-}
-
-/*
- * End-of-archive processing when writing to a gzip file consists of just
- * calling gzclose.
- *
- * It makes no difference whether we opened the file or the caller did it,
- * because libz provides no way of avoiding a close on the underling file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
- * work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
- */
-static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- errno = 0; /* in case gzclose() doesn't set it */
- if (gzclose(mystreamer->gzfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %m",
- mystreamer->pathname);
- exit(1);
- }
-
- mystreamer->gzfile = NULL;
-}
-
-/*
- * Free memory associated with this bbstreamer.
- */
-static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- Assert(mystreamer->base.bbs_next == NULL);
- Assert(mystreamer->gzfile == NULL);
-
- pfree(mystreamer->pathname);
- pfree(mystreamer);
-}
-
-/*
- * Helper function for libz error reporting.
- */
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
-/*
* Create a bbstreamer that extracts an archive.
*
* All pathnames in the archive are interpreted relative to basepath.
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
new file mode 100644
index 0000000..1144090
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -0,0 +1,376 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_gzip.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_gzip.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+
+typedef struct bbstreamer_gzip_extractor
+{
+ bbstreamer base;
+ z_stream zstream;
+ size_t bytes_written;
+} bbstreamer_gzip_extractor;
+
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+
+static void bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_extractor_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_extractor_free(bbstreamer *streamer);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbstreamer_ops bbstreamer_gzip_extractor_ops = {
+ .content = bbstreamer_gzip_extractor_content,
+ .finalize = bbstreamer_gzip_extractor_finalize,
+ .free = bbstreamer_gzip_extractor_free
+};
+#endif
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ errno = 0; /* in case gzclose() doesn't set it */
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of gzip
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_gzip_extractor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_extractor *streamer;
+ z_stream *zs;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_extractor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_extractor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ zs = &streamer->zstream;
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) streamer->base.bbs_buffer.data;
+ zs->avail_out = streamer->base.bbs_buffer.maxlen;
+
+ /*
+ * Data compression was initialized using deflateInit2 to request a gzip
+ * header. Similarly, we are using inflateInit2 to initialize data
+ * decompression.
+ *
+ * Per the documentation of inflateInit2, the second argument is
+ * "windowBits" and it's value must be greater than or equal to the value
+ * provided while compressing the data, so we are using the maximum
+ * possible value for safety.
+ */
+ if (inflateInit2(zs, 15 + 16) != Z_OK)
+ {
+ pg_log_error("could not initialize compression library");
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Decompress the input data to output buffer until we ran out of the input
+ * data. Each time the output buffer is full invoke bbstreamer_content to pass
+ * on the decompressed data to next streamer.
+ */
+static void
+bbstreamer_gzip_extractor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+ z_stream *zs = &mystreamer->zstream;
+
+
+ zs->next_in = (uint8 *) data;
+ zs->avail_in = len;
+
+ /* Process the current chunk */
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ Assert(mystreamer->bytes_written < mystreamer->base.bbs_buffer.maxlen);
+
+ zs->next_out = (uint8 *)
+ mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ zs->avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * Decompresses data starting at zs->next_in and update zs->next_in
+ * and zs->avail_in, generate output data starting at zs->next_out
+ * and update zs->next_out and zs->avail_out accordingly.
+ */
+ res = inflate(zs, Z_NO_FLUSH);
+
+ if (res == Z_STREAM_ERROR)
+ pg_log_error("could not decompress data: %s", zs->msg);
+
+ mystreamer->bytes_written = mystreamer->base.bbs_buffer.maxlen - zs->avail_out;
+
+ /* If output buffer is full then pass on the content to next streamer */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
+ mystreamer->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_gzip_extractor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_gzip_extractor_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_extractor *mystreamer = (bbstreamer_gzip_extractor *) streamer;
+
+ bbstreamer_free(mystreamer->base.bbs_next);
+ pfree(mystreamer->base.bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 72c27c7..9bdc46e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1113,7 +1113,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
- bool is_tar;
+ bool is_tar,
+ is_tar_gz;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1128,6 +1129,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar = (archive_name_len > 4 &&
strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+ /* Is this a gzip archive? */
+ is_tar_gz = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1137,7 +1142,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar)
+ if (must_parse_archive && !is_tar && !is_tar_gz)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1251,6 +1256,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
+#ifdef HAVE_LIBZ
+ /*
+ * If the user has requested a server compressed archive along with archive
+ * extraction at client then we need to decompress it.
+ */
+ if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
+ compressloc == COMPRESS_LOCATION_SERVER)
+ streamer = bbstreamer_gzip_extractor_new(streamer);
+#endif
+
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
return streamer;
--
1.8.3.1
v4-0002-Test-plain-format-server-compressed-gzip-backup.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Test-plain-format-server-compressed-gzip-backup.patchDownload
From c4669b8591cd74ed8f74f521f1c35be7c8240507 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Mon, 24 Jan 2022 18:06:12 +0530
Subject: [PATCH 2/2] Test plain format server compressed gzip backup
---
src/bin/pg_verifybackup/t/009_extract.pl | 66 ++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
create mode 100644 src/bin/pg_verifybackup/t/009_extract.pl
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
new file mode 100644
index 0000000..0eeab46
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -0,0 +1,66 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test aims to verify server compression for plain format backup.
+
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 7;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--compress', 'server-gzip:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $backup_path = $primary->backup_dir . '/' . 'extract_backup';
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+
+ # Take backup with server compression enabled.
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path,
+ '-Xfetch', '--no-sync', '-cfast', '-Fp');
+ push @backup, @{$tc->{'backup_flags'}};
+
+ my @verify = ('pg_verifybackup', '-e', $backup_path);
+
+ # A backup with a valid compression method should work.
+ $primary->command_ok(\@backup, "backup ok with compression method \"$method\"");
+
+ # Verify that backup is extracted
+ if ($method ne 'none')
+ {
+ ok (-f "$backup_path/PG_VERSION", "extracted compressed backup, compression method \"$method\"");
+ }
+ ok(-f "$backup_path/backup_manifest", "backup manifest exists, compression method \"$method\"");
+
+ # Make sure that it verifies OK.
+ $primary->command_ok(\@verify,
+ "verify backup with compression method \"$method\"");
+ }
+
+ # Remove backup immediately to save disk space.
+ rmtree($backup_path);
+}
--
1.8.3.1
On 1/27/22 2:15 AM, Robert Haas wrote:
The attached patch should fix this, too.
Thanks, the issues seem to be fixed now.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Thu, Jan 27, 2022 at 7:15 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 1/27/22 2:15 AM, Robert Haas wrote:
The attached patch should fix this, too.
Thanks, the issues seem to be fixed now.
Cool. I committed that patch.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 1/27/22 10:17 PM, Robert Haas wrote:
Cool. I committed that patch.
Thanks , Please refer to this scenario where the label is set to 0 for
server-gzip but the directory is still compressed
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/11 --gzip
--compress=0 -Xnone
NOTICE: all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/11
16384.tar backup_manifest base.tar
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/10 --gzip
--compress=server-gzip:0 -Xnone
NOTICE: all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/10
16384.tar.gz backup_manifest base.tar.gz
0 is for no compression so the directory should not be compressed if we
mention server-gzip:0 and both these
above scenarios should match?
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Thu, Jan 27, 2022 at 12:08 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 1/27/22 10:17 PM, Robert Haas wrote:
Cool. I committed that patch.
Thanks , Please refer to this scenario where the label is set to 0 for
server-gzip but the directory is still compressed[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/11 --gzip
--compress=0 -Xnone
NOTICE: all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/11
16384.tar backup_manifest base.tar[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/10 --gzip
--compress=server-gzip:0 -Xnone
NOTICE: all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/10
16384.tar.gz backup_manifest base.tar.gz0 is for no compression so the directory should not be compressed if we
mention server-gzip:0 and both these
above scenarios should match?
Well what's weird here is that you are using both --gzip and also
--compress. Those both control the same behavior, so it's a surprising
idea to specify both. But I guess if someone does, we should make the
second one fully override the first one. Here's a patch to try to do
that.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
overwrite-compression-level.patchapplication/octet-stream; name=overwrite-compression-level.patchDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 851f03ca81..897f8982a2 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -966,6 +966,12 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
int firstlen;
char *firstpart;
+ /*
+ * clear 'levelres' so that if there are multiple compression options,
+ * the last one fully overrides the earlier ones
+ */
+ *levelres = 0;
+
/* check if the option is split in two */
sep = strchr(src, ':');
On Thu, Jan 27, 2022 at 2:37 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
I have updated the patches to support server compression (gzip) for
plain format backup. Please find attached v4 patches.
I made a pass over these patches today and made a bunch of minor
corrections. New version attached. The two biggest things I changed
are (1) s/gzip_extractor/gzip_compressor/, because I feel like you
extract an archive like a tarfile, but that is not what is happening
here, this is not an archive and (2) I took a few bits of out of the
test case that didn't seem to be necessary. There wasn't any reason
that I could see why testing for PG_VERSION needed to be skipped when
the compression method is 'none', so my first thought was to just take
out the 'if' statement around that, but then after more thought that
test and the one for pg_verifybackup are certainly going to fail if
those files are not present, so why have an extra test? It might make
sense if we were only conditionally able to run pg_verifybackup and
wanted to have some test coverage even when we can't, but that's not
the case here, so I see no point.
I studied this a bit to see whether I needed to make any adjustments
along the lines of 4f0bcc735038e96404fae59aa16ef9beaf6bb0aa in order
for this to work on msys. I think I don't, because 002_algorithm.pl
and 003_corruption.pl both pass $backup_path, not $real_backup_path,
to command_ok -- and I think something inside there does the
translation, which is weird, but we might as well be consistent.
008_untar.pl and 4f0bcc735038e96404fae59aa16ef9beaf6bb0aa needed to do
something different because --target server:X confused the msys magic,
but I think that shouldn't be an issue for this patch. However, I
might be wrong.
Barring objections or problems, I plan to commit this version
tomorrow. I'd do it today, but I have plans for tonight that are
incompatible with discovering that the build farm hates this ....
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v5-0001-Allow-server-side-compression-to-be-used-with-Fp.patchapplication/octet-stream; name=v5-0001-Allow-server-side-compression-to-be-used-with-Fp.patchDownload
From f941e48138888dac0bccb36cfa2d5026f153c90c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 27 Jan 2022 13:09:00 -0500
Subject: [PATCH v5] Allow server-side compression to be used with -Fp.
If you have a low-bandwidth connection between the client and the
server, it's reasonable to want to compress on the server side but
then decompress and extract the backup on the client side. This
commit allows you do to do just that.
Dipesh Pandit, with minor and mostly cosmetic changes by me.
Discussion: http://postgr.es/m/CAN1g5_HiSh8ajUMd4ePtGyCXo89iKZTzaNyzP_qv1eJbi4YHXA@mail.gmail.com
---
doc/src/sgml/ref/pg_basebackup.sgml | 7 +-
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_file.c | 182 -----------
src/bin/pg_basebackup/bbstreamer_gzip.c | 380 +++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 19 +-
src/bin/pg_verifybackup/t/009_extract.pl | 61 ++++
7 files changed, 465 insertions(+), 186 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_gzip.c
create mode 100644 src/bin/pg_verifybackup/t/009_extract.pl
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index a5e03d2c66..dfd8aebc9a 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -428,8 +428,11 @@ PostgreSQL documentation
</para>
<para>
When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. Compression is not
- available in plain format.
+ automatically be added to all tar filenames. When the plain format is
+ used, client-side compression may not be specified, but it is
+ still possible to request server-side compression. If this is done,
+ the server will compress the backup for transmission, and the
+ client will decompress and extract it.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 5b18851e5c..78d96c649c 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -38,6 +38,7 @@ OBJS = \
BBOBJS = \
pg_basebackup.o \
bbstreamer_file.o \
+ bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_tar.o
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50126..fe49ae35e5 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
+extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca2221a0..d721f87891 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -11,10 +11,6 @@
#include "postgres_fe.h"
-#ifdef HAVE_LIBZ
-#include <zlib.h>
-#endif
-
#include <unistd.h>
#include "bbstreamer.h"
@@ -30,15 +26,6 @@ typedef struct bbstreamer_plain_writer
bool should_close_file;
} bbstreamer_plain_writer;
-#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
-{
- bbstreamer base;
- char *pathname;
- gzFile gzfile;
-} bbstreamer_gzip_writer;
-#endif
-
typedef struct bbstreamer_extractor
{
bbstreamer base;
@@ -62,22 +49,6 @@ const bbstreamer_ops bbstreamer_plain_writer_ops = {
.free = bbstreamer_plain_writer_free
};
-#ifdef HAVE_LIBZ
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member,
- const char *data, int len,
- bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
-static const char *get_gz_error(gzFile gzf);
-
-const bbstreamer_ops bbstreamer_gzip_writer_ops = {
- .content = bbstreamer_gzip_writer_content,
- .finalize = bbstreamer_gzip_writer_finalize,
- .free = bbstreamer_gzip_writer_free
-};
-#endif
-
static void bbstreamer_extractor_content(bbstreamer *streamer,
bbstreamer_member *member,
const char *data, int len,
@@ -195,159 +166,6 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
pfree(mystreamer);
}
-/*
- * Create a bbstreamer that just compresses data using gzip, and then writes
- * it to a file.
- *
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
- * for error reporting purposes; if file is NULL, it is also the opened and
- * closed so that the data may be written there.
- */
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
-{
-#ifdef HAVE_LIBZ
- bbstreamer_gzip_writer *streamer;
-
- streamer = palloc0(sizeof(bbstreamer_gzip_writer));
- *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
- &bbstreamer_gzip_writer_ops;
-
- streamer->pathname = pstrdup(pathname);
-
- if (file == NULL)
- {
- streamer->gzfile = gzopen(pathname, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not create compressed file \"%s\": %m",
- pathname);
- exit(1);
- }
- }
- else
- {
- int fd = dup(fileno(file));
-
- if (fd < 0)
- {
- pg_log_error("could not duplicate stdout: %m");
- exit(1);
- }
-
- streamer->gzfile = gzdopen(fd, "wb");
- if (streamer->gzfile == NULL)
- {
- pg_log_error("could not open output file: %m");
- exit(1);
- }
- }
-
- if (gzsetparams(streamer->gzfile, compresslevel,
- Z_DEFAULT_STRATEGY) != Z_OK)
- {
- pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
- exit(1);
- }
-
- return &streamer->base;
-#else
- pg_log_error("this build does not support compression");
- exit(1);
-#endif
-}
-
-#ifdef HAVE_LIBZ
-/*
- * Write archive content to gzip file.
- */
-static void
-bbstreamer_gzip_writer_content(bbstreamer *streamer,
- bbstreamer_member *member, const char *data,
- int len, bbstreamer_archive_context context)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- if (len == 0)
- return;
-
- errno = 0;
- if (gzwrite(mystreamer->gzfile, data, len) != len)
- {
- /* if write didn't set errno, assume problem is no disk space */
- if (errno == 0)
- errno = ENOSPC;
- pg_log_error("could not write to compressed file \"%s\": %s",
- mystreamer->pathname, get_gz_error(mystreamer->gzfile));
- exit(1);
- }
-}
-
-/*
- * End-of-archive processing when writing to a gzip file consists of just
- * calling gzclose.
- *
- * It makes no difference whether we opened the file or the caller did it,
- * because libz provides no way of avoiding a close on the underling file
- * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
- * work around this issue, so that the behavior from the caller's viewpoint
- * is the same as for bbstreamer_plain_writer.
- */
-static void
-bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- errno = 0; /* in case gzclose() doesn't set it */
- if (gzclose(mystreamer->gzfile) != 0)
- {
- pg_log_error("could not close compressed file \"%s\": %m",
- mystreamer->pathname);
- exit(1);
- }
-
- mystreamer->gzfile = NULL;
-}
-
-/*
- * Free memory associated with this bbstreamer.
- */
-static void
-bbstreamer_gzip_writer_free(bbstreamer *streamer)
-{
- bbstreamer_gzip_writer *mystreamer;
-
- mystreamer = (bbstreamer_gzip_writer *) streamer;
-
- Assert(mystreamer->base.bbs_next == NULL);
- Assert(mystreamer->gzfile == NULL);
-
- pfree(mystreamer->pathname);
- pfree(mystreamer);
-}
-
-/*
- * Helper function for libz error reporting.
- */
-static const char *
-get_gz_error(gzFile gzf)
-{
- int errnum;
- const char *errmsg;
-
- errmsg = gzerror(gzf, &errnum);
- if (errnum == Z_ERRNO)
- return strerror(errno);
- else
- return errmsg;
-}
-#endif
-
/*
* Create a bbstreamer that extracts an archive.
*
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
new file mode 100644
index 0000000000..2c16e62882
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -0,0 +1,380 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_gzip.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_gzip.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#ifdef HAVE_LIBZ
+#include <zlib.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBZ
+typedef struct bbstreamer_gzip_writer
+{
+ bbstreamer base;
+ char *pathname;
+ gzFile gzfile;
+} bbstreamer_gzip_writer;
+
+typedef struct bbstreamer_gzip_decompressor
+{
+ bbstreamer base;
+ z_stream zstream;
+ size_t bytes_written;
+} bbstreamer_gzip_decompressor;
+
+static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
+static const char *get_gz_error(gzFile gzf);
+
+const bbstreamer_ops bbstreamer_gzip_writer_ops = {
+ .content = bbstreamer_gzip_writer_content,
+ .finalize = bbstreamer_gzip_writer_finalize,
+ .free = bbstreamer_gzip_writer_free
+};
+
+static void bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_gzip_decompressor_free(bbstreamer *streamer);
+static void *gzip_palloc(void *opaque, unsigned items, unsigned size);
+static void gzip_pfree(void *opaque, void *address);
+
+const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
+ .content = bbstreamer_gzip_decompressor_content,
+ .finalize = bbstreamer_gzip_decompressor_finalize,
+ .free = bbstreamer_gzip_decompressor_free
+};
+#endif
+
+/*
+ * Create a bbstreamer that just compresses data using gzip, and then writes
+ * it to a file.
+ *
+ * As in the case of bbstreamer_plain_writer_new, pathname is always used
+ * for error reporting purposes; if file is NULL, it is also the opened and
+ * closed so that the data may be written there.
+ */
+bbstreamer *
+bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_writer *streamer;
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_writer));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_writer_ops;
+
+ streamer->pathname = pstrdup(pathname);
+
+ if (file == NULL)
+ {
+ streamer->gzfile = gzopen(pathname, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not create compressed file \"%s\": %m",
+ pathname);
+ exit(1);
+ }
+ }
+ else
+ {
+ int fd = dup(fileno(file));
+
+ if (fd < 0)
+ {
+ pg_log_error("could not duplicate stdout: %m");
+ exit(1);
+ }
+
+ streamer->gzfile = gzdopen(fd, "wb");
+ if (streamer->gzfile == NULL)
+ {
+ pg_log_error("could not open output file: %m");
+ exit(1);
+ }
+ }
+
+ if (gzsetparams(streamer->gzfile, compresslevel,
+ Z_DEFAULT_STRATEGY) != Z_OK)
+ {
+ pg_log_error("could not set compression level %d: %s",
+ compresslevel, get_gz_error(streamer->gzfile));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Write archive content to gzip file.
+ */
+static void
+bbstreamer_gzip_writer_content(bbstreamer *streamer,
+ bbstreamer_member *member, const char *data,
+ int len, bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ if (len == 0)
+ return;
+
+ errno = 0;
+ if (gzwrite(mystreamer->gzfile, data, len) != len)
+ {
+ /* if write didn't set errno, assume problem is no disk space */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_log_error("could not write to compressed file \"%s\": %s",
+ mystreamer->pathname, get_gz_error(mystreamer->gzfile));
+ exit(1);
+ }
+}
+
+/*
+ * End-of-archive processing when writing to a gzip file consists of just
+ * calling gzclose.
+ *
+ * It makes no difference whether we opened the file or the caller did it,
+ * because libz provides no way of avoiding a close on the underling file
+ * handle. Notice, however, that bbstreamer_gzip_writer_new() uses dup() to
+ * work around this issue, so that the behavior from the caller's viewpoint
+ * is the same as for bbstreamer_plain_writer.
+ */
+static void
+bbstreamer_gzip_writer_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ errno = 0; /* in case gzclose() doesn't set it */
+ if (gzclose(mystreamer->gzfile) != 0)
+ {
+ pg_log_error("could not close compressed file \"%s\": %m",
+ mystreamer->pathname);
+ exit(1);
+ }
+
+ mystreamer->gzfile = NULL;
+}
+
+/*
+ * Free memory associated with this bbstreamer.
+ */
+static void
+bbstreamer_gzip_writer_free(bbstreamer *streamer)
+{
+ bbstreamer_gzip_writer *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_writer *) streamer;
+
+ Assert(mystreamer->base.bbs_next == NULL);
+ Assert(mystreamer->gzfile == NULL);
+
+ pfree(mystreamer->pathname);
+ pfree(mystreamer);
+}
+
+/*
+ * Helper function for libz error reporting.
+ */
+static const char *
+get_gz_error(gzFile gzf)
+{
+ int errnum;
+ const char *errmsg;
+
+ errmsg = gzerror(gzf, &errnum);
+ if (errnum == Z_ERRNO)
+ return strerror(errno);
+ else
+ return errmsg;
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of gzip
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_gzip_decompressor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZ
+ bbstreamer_gzip_decompressor *streamer;
+ z_stream *zs;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_gzip_decompressor));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_gzip_decompressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ zs = &streamer->zstream;
+ zs->zalloc = gzip_palloc;
+ zs->zfree = gzip_pfree;
+ zs->next_out = (uint8 *) streamer->base.bbs_buffer.data;
+ zs->avail_out = streamer->base.bbs_buffer.maxlen;
+
+ /*
+ * Data compression was initialized using deflateInit2 to request a gzip
+ * header. Similarly, we are using inflateInit2 to initialize data
+ * decompression.
+ *
+ * Per the documentation for inflateInit2, the second argument is
+ * "windowBits" and its value must be greater than or equal to the value
+ * provided while compressing the data, so we are using the maximum
+ * possible value for safety.
+ */
+ if (inflateInit2(zs, 15 + 16) != Z_OK)
+ {
+ pg_log_error("could not initialize compression library");
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZ
+/*
+ * Decompress the input data to output buffer until we run out of input
+ * data. Each time the output buffer is full, pass on the decompressed data
+ * to the next streamer.
+ */
+static void
+bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_gzip_decompressor *mystreamer;
+ z_stream *zs;
+
+ mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+
+ zs = &mystreamer->zstream;
+ zs->next_in = (uint8 *) data;
+ zs->avail_in = len;
+
+ /* Process the current chunk */
+ while (zs->avail_in > 0)
+ {
+ int res;
+
+ Assert(mystreamer->bytes_written < mystreamer->base.bbs_buffer.maxlen);
+
+ zs->next_out = (uint8 *)
+ mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ zs->avail_out =
+ mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * This call decompresses data starting at zs->next_in and updates
+ * zs->next_in * and zs->avail_in. It generates output data starting at
+ * zs->next_out and updates zs->next_out and zs->avail_out accordingly.
+ */
+ res = inflate(zs, Z_NO_FLUSH);
+
+ if (res == Z_STREAM_ERROR)
+ pg_log_error("could not decompress data: %s", zs->msg);
+
+ mystreamer->bytes_written =
+ mystreamer->base.bbs_buffer.maxlen - zs->avail_out;
+
+ /* If output buffer is full then pass data to next streamer */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen, context);
+ mystreamer->bytes_written = 0;
+ }
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_gzip_decompressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_gzip_decompressor *mystreamer;
+
+ mystreamer = (bbstreamer_gzip_decompressor *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_gzip_decompressor_free(bbstreamer *streamer)
+{
+ bbstreamer_free(streamer->bbs_next);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+
+/*
+ * Wrapper function to adjust the signature of palloc to match what libz
+ * expects.
+ */
+static void *
+gzip_palloc(void *opaque, unsigned items, unsigned size)
+{
+ return palloc(items * size);
+}
+
+/*
+ * Wrapper function to adjust the signature of pfree to match what libz
+ * expects.
+ */
+static void
+gzip_pfree(void *opaque, void *address)
+{
+ pfree(address);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 851f03ca81..e254fcf82a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1113,7 +1113,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
- bool is_tar;
+ bool is_tar,
+ is_tar_gz;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1128,6 +1129,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar = (archive_name_len > 4 &&
strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
+ /* Is this a gzip archive? */
+ is_tar_gz = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1137,7 +1142,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar)
+ if (must_parse_archive && !is_tar && !is_tar_gz)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1251,6 +1256,16 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
+#ifdef HAVE_LIBZ
+ /*
+ * If the user has requested a server compressed archive along with archive
+ * extraction at client then we need to decompress it.
+ */
+ if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
+ compressloc == COMPRESS_LOCATION_SERVER)
+ streamer = bbstreamer_gzip_decompressor_new(streamer);
+#endif
+
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
return streamer;
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
new file mode 100644
index 0000000000..f1091ffea7
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -0,0 +1,61 @@
+
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test aims to verify that the client can decompress and extract
+# a backup which was compressed by the server.
+
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 4;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--compress', 'server-gzip:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $backup_path = $primary->backup_dir . '/' . 'extract_backup';
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+
+ # Take backup with server compression enabled.
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path,
+ '-Xfetch', '--no-sync', '-cfast', '-Fp');
+ push @backup, @{$tc->{'backup_flags'}};
+
+ my @verify = ('pg_verifybackup', '-e', $backup_path);
+
+ # A backup with a valid compression method should work.
+ $primary->command_ok(\@backup,
+ "backup done, compression method \"$method\"");
+
+ # Make sure that it verifies OK.
+ $primary->command_ok(\@verify,
+ "backup verified, compression method \"$method\"");
+ }
+
+ # Remove backup immediately to save disk space.
+ rmtree($backup_path);
+}
--
2.24.3 (Apple Git-128)
Hi,
I made a pass over these patches today and made a bunch of minor
corrections. New version attached. The two biggest things I changed
are (1) s/gzip_extractor/gzip_compressor/, because I feel like you
extract an archive like a tarfile, but that is not what is happening
here, this is not an archive and (2) I took a few bits of out of the
test case that didn't seem to be necessary. There wasn't any reason
that I could see why testing for PG_VERSION needed to be skipped when
the compression method is 'none', so my first thought was to just take
out the 'if' statement around that, but then after more thought that
test and the one for pg_verifybackup are certainly going to fail if
those files are not present, so why have an extra test? It might make
sense if we were only conditionally able to run pg_verifybackup and
wanted to have some test coverage even when we can't, but that's not
the case here, so I see no point.
Thanks. This makes sense.
+#ifdef HAVE_LIBZ
+ /*
+ * If the user has requested a server compressed archive along with
archive
+ * extraction at client then we need to decompress it.
+ */
+ if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
+ compressloc == COMPRESS_LOCATION_SERVER)
+ streamer = bbstreamer_gzip_decompressor_new(streamer);
+#endif
I think it is not required to have HAVE_LIBZ check in pg_basebackup.c
while creating a new gzip writer/decompressor. This check is already
in place in bbstreamer_gzip_writer_new() and
bbstreamer_gzip_decompressor_new()
and it throws an error in case the build does not have required library
support. I have removed this check from pg_basebackup.c and updated
a delta patch. The patch can be applied on v5 patch.
Thanks,
Dipesh
Attachments:
remove-check-for-library-support.patchapplication/x-patch; name=remove-check-for-library-support.patchDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 46ab60d..1f81bbf 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1199,7 +1199,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
compressloc != COMPRESS_LOCATION_CLIENT)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
-#ifdef HAVE_LIBZ
else if (compressmethod == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
@@ -1207,7 +1206,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file,
compresslevel);
}
-#endif
else
{
Assert(false); /* not reachable */
@@ -1256,7 +1254,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
else if (expect_unterminated_tarfile)
streamer = bbstreamer_tar_terminator_new(streamer);
-#ifdef HAVE_LIBZ
/*
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
@@ -1264,7 +1261,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
compressloc == COMPRESS_LOCATION_SERVER)
streamer = bbstreamer_gzip_decompressor_new(streamer);
-#endif
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
On 1/27/22 11:12 PM, Robert Haas wrote:
Well what's weird here is that you are using both --gzip and also
--compress. Those both control the same behavior, so it's a surprising
idea to specify both. But I guess if someone does, we should make the
second one fully override the first one. Here's a patch to try to do
that.
right, the current behavior was -
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/y101 --gzip -Z
none -Xnone
pg_basebackup: error: cannot use compression level with method none
Try "pg_basebackup --help" for more information.
and even this was not matching with PG v14 behavior too
e.g
./pg_basebackup -Ft -z -Z none -D /tmp/test1 ( working in PG v14 but
throwing above error on PG HEAD)
and somewhere we were breaking the backward compatibility.
now with your patch -this seems working fine
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/y101 --gzip*-Z
none* -Xnone
NOTICE: WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
[edb@centos7tushar bin]$ ls /tmp/y101
backup_manifest *base.tar*
OR
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/y0p -Z none
-Xfetch *-z*
[edb@centos7tushar bin]$ ls /tmp/y0p
backup_manifest *base.tar.gz*
but what about server-gzip:0? should it allow compressing the directory?
[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/1
--compress=server-gzip:0 -Xfetch
[edb@centos7tushar bin]$ ls /tmp/1
backup_manifest base.tar.gz
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Fri, Jan 28, 2022 at 3:54 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Thanks. This makes sense.
+#ifdef HAVE_LIBZ + /* + * If the user has requested a server compressed archive along with archive + * extraction at client then we need to decompress it. + */ + if (format == 'p' && compressmethod == COMPRESSION_GZIP && + compressloc == COMPRESS_LOCATION_SERVER) + streamer = bbstreamer_gzip_decompressor_new(streamer); +#endifI think it is not required to have HAVE_LIBZ check in pg_basebackup.c
while creating a new gzip writer/decompressor. This check is already
in place in bbstreamer_gzip_writer_new() and bbstreamer_gzip_decompressor_new()
and it throws an error in case the build does not have required library
support. I have removed this check from pg_basebackup.c and updated
a delta patch. The patch can be applied on v5 patch.
Right, makes sense. Committed with that change, plus I realized the
skip count in the test case file was wrong after the changes I made
yesterday, so I fixed that as well.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Robert,
I have attached the latest rebased version of the LZ4 server-side
compression
patch on the recent commits. This patch also introduces the compression
level
and adds a tap test.
Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I
found
an unused variable {have_zlib}. I have attached a cleanup patch for that as
well.
Please review and let me know your thoughts.
Regards,
Jeevan Ladhe
Attachments:
0001-gzip-tap-test-remove-extra-variable.patchapplication/octet-stream; name=0001-gzip-tap-test-remove-extra-variable.patchDownload
From 52da9327027318de383715e6df712647243b8515 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Fri, 28 Jan 2022 22:57:26 +0530
Subject: [PATCH 1/2] gzip tap test: remove extra variable.
---
src/bin/pg_verifybackup/t/008_untar.pl | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index aad7568b65..d32c86e92e 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -17,7 +17,6 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
$primary->start;
-my $have_zlib = check_pg_config("#define HAVE_LIBZ 1");
my $backup_path = $primary->backup_dir . '/server-backup';
my $real_backup_path = PostgreSQL::Test::Utils::perl2host($backup_path);
my $extract_path = $primary->backup_dir . '/extracted-backup';
--
2.25.1
v10-0002-Add-a-LZ4-compression-method-for-server-side-compres.patchapplication/octet-stream; name=v10-0002-Add-a-LZ4-compression-method-for-server-side-compres.patchDownload
From 6529bfef967d3cbba140f5de0226f24c339d883e Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Fri, 28 Jan 2022 22:58:39 +0530
Subject: [PATCH 2/2] Add a LZ4 compression method for server side compression.
Add LZ4 server side compression option --compress=server-lz4
Provide compression-level for lz4 compression.
Add tap test scenario in pg_verifybackup for lz4.
Add documentation.
Add pg_basebackup help for lz4 option.
Example usage:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-lz4:4
---
doc/src/sgml/protocol.sgml | 7 +-
doc/src/sgml/ref/pg_basebackup.sgml | 24 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_lz4.c | 298 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 17 +-
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 10 +-
src/include/replication/basebackup_sink.h | 1 +
9 files changed, 348 insertions(+), 18 deletions(-)
create mode 100644 src/backend/replication/basebackup_lz4.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 68908dcb7b..b599bbdce5 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2724,8 +2724,8 @@ The commands accepted in replication mode are:
<listitem>
<para>
Instructs the server to compress the backup using the specified
- method. Currently, the only supported method is
- <literal>gzip</literal>.
+ method. Currently, the supported methods are <literal>gzip</literal>
+ and <literal>lz4</literal>.
</para>
</listitem>
</varlistentry>
@@ -2736,7 +2736,8 @@ The commands accepted in replication mode are:
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- The value should be an integer between 1 and 9.
+ For <literal>gzip</literal> the value should be an integer between 1
+ and 9, and for <literal>lz4</literal> it should be between 1 and 12.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index dfd8aebc9a..9e9681ed77 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -416,10 +416,13 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to either <literal>gzip</literal>
- for compression with <application>gzip</application>, or
- <literal>none</literal> for no compression. A compression level
- can be optionally specified, by appending the level number after a
+ The compression method can be set to <literal>gzip</literal> for
+ compression with <application>gzip</application>, or
+ <literal>lz4</literal> for compression with
+ <application>lz4</application>, or <literal>none</literal> for no
+ compression. However, <literal>lz4</literal> can be currently only
+ used with <literal>server</literal>. A compression level can be
+ optionally specified, by appending the level number after a
colon (<literal>:</literal>). If no level is specified, the default
compression level will be used. If only a level is specified without
mentioning an algorithm, <literal>gzip</literal> compression will
@@ -427,12 +430,13 @@ PostgreSQL documentation
used if the level is 0.
</para>
<para>
- When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. When the plain format is
- used, client-side compression may not be specified, but it is
- still possible to request server-side compression. If this is done,
- the server will compress the backup for transmission, and the
- client will decompress and extract it.
+ When the tar format is used with <literal>gzip</literal> or
+ <literal>lz4</literal>, the suffix <filename>.gz</filename> or
+ <filename>.lz4</filename> will automatically be added to all tar
+ filenames. When the plain format is used, client-side compression may
+ not be specified, but it is still possible to request server-side
+ compression. If this is done, the server will compress the backup for
+ transmission, and the client will decompress and extract it.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 10ce2406c0..d54f30d8b4 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -904,6 +905,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_NONE;
else if (strcmp(optval, "gzip") == 0)
opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1028,6 +1031,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..2a169d2e67
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,298 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_lz4_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 12);
+
+ if (compresslevel < 0 || compresslevel > 12)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->compressionLevel = mysink->compresslevel;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written += headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed by
+ * LZ4F_compressBound(), ask the next sink to process the data so that we
+ * can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written += compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written += compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1f81bbf4e2..c3a3a77cd9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -997,6 +997,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
+ {
+ *methodres = COMPRESSION_LZ4;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1924,6 +1929,9 @@ BaseBackup(void)
case COMPRESSION_GZIP:
compressmethodstr = "gzip";
break;
+ case COMPRESSION_LZ4:
+ compressmethodstr = "lz4";
+ break;
default:
Assert(false);
break;
@@ -2766,8 +2774,11 @@ main(int argc, char **argv)
}
break;
case COMPRESSION_LZ4:
- /* option not supported */
- Assert(false);
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ pg_log_error("client compression not supported with lz4");
+ exit(1);
+ }
break;
}
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 1ae818f9a1..851233a6e0 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -9,6 +9,7 @@ export TAR
# used by the command "gzip" to pass down options, so stick with a different
# name.
export GZIP_PROGRAM=$(GZIP)
+export LZ4=$(LZ4)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index d32c86e92e..eeaff4d8f6
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 6;
+use Test::More tests => 9;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -35,6 +35,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'GZIP_PROGRAM'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'server-lz4'],
+ 'backup_archive' => 'base.tar.lz4',
+ 'decompress_program' => $ENV{'LZ4'},
+ 'decompress_flags' => [ '-m' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
}
);
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index d3276b2487..a8db8e9a0e 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
I have attached the latest rebased version of the LZ4 server-side compression
patch on the recent commits. This patch also introduces the compression level
and adds a tap test.
In view of this morning's commit of
d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
committing this patch has gone up. We need to make it support
decompression with LZ4 on the client side, as we now have for gzip.
Other comments:
- Even if we were going to support LZ4 only on the server side, surely
it's not right to refuse --compress lz4 and --compress client-lz4 at
the parsing stage. I don't even think the message you added to main()
is reachable.
- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.
Other than that this seems like it's in pretty good shape.
Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I found
an unused variable {have_zlib}. I have attached a cleanup patch for that as well.
This part seems clearly correct, so I have committed it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Jan 29, 2022 at 1:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:I have attached the latest rebased version of the LZ4 server-side
compression
patch on the recent commits. This patch also introduces the compression
level
and adds a tap test.
In view of this morning's commit of
d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
committing this patch has gone up. We need to make it support
decompression with LZ4 on the client side, as we now have for gzip.
Fair enough. Makes sense.
- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.
'-d' is the default when we have a .lz4 extension, which is true in our
case,
hence elimininated that. About, '-m' introduction, without any option, or
even
after providing the explicit '-d' option, weirdly lz4 command was throwing
decompressed tar on the console, that's when in my lz4 man version I saw
these 2 lines and tried adding '-m' option, and it worked:
" It is considered bad practice to rely on implicit output in scripts.
because the script´s environment may change. Always use explicit
output in scripts. -c ensures that output will be stdout. Conversely,
providing a destination name, or using -m ensures that the output will
be either the specified name, or filename.lz4 respectively."
and
"Similarly, lz4 -m -d can decompress multiple *.lz4 files."
This part seems clearly correct, so I have committed it.
Thanks for pushing it.
Regards,
Jeevan Ladhe
Hi Robert,
I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.
Please find the attached patch with the following changes:
- Even if we were going to support LZ4 only on the server side, surely
it's not right to refuse --compress lz4 and --compress client-lz4 at
the parsing stage. I don't even think the message you added to main()
is reachable.
I think you are right, I have removed the message and again introduced
the Assert() back.
- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.
As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
throwing the decompression to stdout. Now, I have removed the '-m',
added '-d' for decompression, and also added the target file explicitly in
the command.
Regards,
Jeevan Ladhe
Attachments:
v11-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchapplication/octet-stream; name=v11-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchDownload
From bae601f0cfc11ef013e6a3fef39d2fa42979ff24 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Fri, 28 Jan 2022 22:58:39 +0530
Subject: [PATCH] Add a LZ4 compression method for server side compression.
Add LZ4 server side compression option --compress=server-lz4
Provide compression-level for lz4 compression.
Add tap test scenario in pg_verifybackup for lz4.
Add documentation.
Add pg_basebackup help for lz4 option.
Example usage:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-lz4:4
---
doc/src/sgml/protocol.sgml | 7 +-
doc/src/sgml/ref/pg_basebackup.sgml | 24 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_lz4.c | 298 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 10 +-
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 13 +-
src/include/replication/basebackup_sink.h | 1 +
9 files changed, 346 insertions(+), 16 deletions(-)
create mode 100644 src/backend/replication/basebackup_lz4.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 68908dcb7b..b599bbdce5 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2724,8 +2724,8 @@ The commands accepted in replication mode are:
<listitem>
<para>
Instructs the server to compress the backup using the specified
- method. Currently, the only supported method is
- <literal>gzip</literal>.
+ method. Currently, the supported methods are <literal>gzip</literal>
+ and <literal>lz4</literal>.
</para>
</listitem>
</varlistentry>
@@ -2736,7 +2736,8 @@ The commands accepted in replication mode are:
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- The value should be an integer between 1 and 9.
+ For <literal>gzip</literal> the value should be an integer between 1
+ and 9, and for <literal>lz4</literal> it should be between 1 and 12.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index dfd8aebc9a..9e9681ed77 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -416,10 +416,13 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to either <literal>gzip</literal>
- for compression with <application>gzip</application>, or
- <literal>none</literal> for no compression. A compression level
- can be optionally specified, by appending the level number after a
+ The compression method can be set to <literal>gzip</literal> for
+ compression with <application>gzip</application>, or
+ <literal>lz4</literal> for compression with
+ <application>lz4</application>, or <literal>none</literal> for no
+ compression. However, <literal>lz4</literal> can be currently only
+ used with <literal>server</literal>. A compression level can be
+ optionally specified, by appending the level number after a
colon (<literal>:</literal>). If no level is specified, the default
compression level will be used. If only a level is specified without
mentioning an algorithm, <literal>gzip</literal> compression will
@@ -427,12 +430,13 @@ PostgreSQL documentation
used if the level is 0.
</para>
<para>
- When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. When the plain format is
- used, client-side compression may not be specified, but it is
- still possible to request server-side compression. If this is done,
- the server will compress the backup for transmission, and the
- client will decompress and extract it.
+ When the tar format is used with <literal>gzip</literal> or
+ <literal>lz4</literal>, the suffix <filename>.gz</filename> or
+ <filename>.lz4</filename> will automatically be added to all tar
+ filenames. When the plain format is used, client-side compression may
+ not be specified, but it is still possible to request server-side
+ compression. If this is done, the server will compress the backup for
+ transmission, and the client will decompress and extract it.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 10ce2406c0..d54f30d8b4 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -904,6 +905,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_NONE;
else if (strcmp(optval, "gzip") == 0)
opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1028,6 +1031,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..2a169d2e67
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,298 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_lz4_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 12);
+
+ if (compresslevel < 0 || compresslevel > 12)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->compressionLevel = mysink->compresslevel;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written += headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed by
+ * LZ4F_compressBound(), ask the next sink to process the data so that we
+ * can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written += compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written += compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1f81bbf4e2..d637ccec93 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -997,6 +997,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
+ {
+ *methodres = COMPRESSION_LZ4;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1924,6 +1929,9 @@ BaseBackup(void)
case COMPRESSION_GZIP:
compressmethodstr = "gzip";
break;
+ case COMPRESSION_LZ4:
+ compressmethodstr = "lz4";
+ break;
default:
Assert(false);
break;
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 1ae818f9a1..851233a6e0 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -9,6 +9,7 @@ export TAR
# used by the command "gzip" to pass down options, so stick with a different
# name.
export GZIP_PROGRAM=$(GZIP)
+export LZ4=$(LZ4)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index d32c86e92e..33fd22a188
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 6;
+use Test::More tests => 9;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -35,6 +35,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'GZIP_PROGRAM'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'server-lz4'],
+ 'backup_archive' => 'base.tar.lz4',
+ 'decompress_program' => $ENV{'LZ4'},
+ 'decompress_flags' => [ '-d' ],
+ 'output_file' => 'base.tar',
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
}
);
@@ -74,6 +83,8 @@ for my $tc (@test_configuration)
push @decompress, @{$tc->{'decompress_flags'}}
if $tc->{'decompress_flags'};
push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ push @decompress, $backup_path . '/' . $tc->{'output_file'}
+ if $tc->{'output_file'};
system_or_bail(@decompress);
}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index d3276b2487..a8db8e9a0e 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On Mon, Jan 31, 2022 at 6:11 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.
OK. I guess we should also be thinking about client-side LZ4
compression. It's probably best to focus on that before worrying about
ZSTD, even though ZSTD would be really cool to have.
- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
throwing the decompression to stdout. Now, I have removed the '-m',
added '-d' for decompression, and also added the target file explicitly in
the command.
I don't see the behavior you describe here. For me:
[rhaas ~]$ lz4 q.lz4
Decoding file q
q.lz4 : decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -m q.lz4
[rhaas ~]$ ls q
q
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d q.lz4
Decoding file q
q.lz4 : decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d -m q.lz4
[rhaas ~]$ ls q
q
In other words, on my system, the file gets decompressed with or
without -d, and with or without -m. The only difference I see is that
using -m makes it happen silently, without printing anything on the
terminal. Anyway, I wasn't saying that using -m was necessarily wrong,
just that I didn't understand why you had it like that. Now that I'm
more informed, I recommend that we use -d -m, the former to be
explicit about wanting to decompress and the latter because it either
makes it less noisy (on my system) or makes it work at all (on yours).
It's surprising that the command behavior would be different like that
on different systems, but it is what it is. I think any set of flags
we put here is better than adding more logical in perl, as it keeps
things simpler.
--
Robert Haas
EDB: http://www.enterprisedb.com
I think you are right, I have removed the message and again introduced
the Assert() back.
In my previous version of patch, this was a problem, basically, there should
not be an assert as the code is still reachable be it server-lz4 or
client-lz4.
I removed the assert and added the level range check there similar to gzip.
Regards,
Jeevan Ladhe
Attachments:
v12-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchapplication/octet-stream; name=v12-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchDownload
From 8387555495c828274221f61ebdac86eca892b85a Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Fri, 28 Jan 2022 22:58:39 +0530
Subject: [PATCH] Add a LZ4 compression method for server side compression.
Add LZ4 server side compression option --compress=server-lz4
Provide compression-level for lz4 compression.
Add tap test scenario in pg_verifybackup for lz4.
Add documentation.
Add pg_basebackup help for lz4 option.
Example usage:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-lz4:4
---
doc/src/sgml/protocol.sgml | 7 +-
doc/src/sgml/ref/pg_basebackup.sgml | 24 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_lz4.c | 298 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 18 +-
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 13 +-
src/include/replication/basebackup_sink.h | 1 +
9 files changed, 352 insertions(+), 18 deletions(-)
create mode 100644 src/backend/replication/basebackup_lz4.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 24e93f9b28..d96b86fe03 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2729,8 +2729,8 @@ The commands accepted in replication mode are:
<listitem>
<para>
Instructs the server to compress the backup using the specified
- method. Currently, the only supported method is
- <literal>gzip</literal>.
+ method. Currently, the supported methods are <literal>gzip</literal>
+ and <literal>lz4</literal>.
</para>
</listitem>
</varlistentry>
@@ -2741,7 +2741,8 @@ The commands accepted in replication mode are:
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- The value should be an integer between 1 and 9.
+ For <literal>gzip</literal> the value should be an integer between 1
+ and 9, and for <literal>lz4</literal> it should be between 1 and 12.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1546f10c0d..5566a366b6 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,10 +417,13 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to either <literal>gzip</literal>
- for compression with <application>gzip</application>, or
- <literal>none</literal> for no compression. A compression level
- can be optionally specified, by appending the level number after a
+ The compression method can be set to <literal>gzip</literal> for
+ compression with <application>gzip</application>, or
+ <literal>lz4</literal> for compression with
+ <application>lz4</application>, or <literal>none</literal> for no
+ compression. However, <literal>lz4</literal> can be currently only
+ used with <literal>server</literal>. A compression level can be
+ optionally specified, by appending the level number after a
colon (<literal>:</literal>). If no level is specified, the default
compression level will be used. If only a level is specified without
mentioning an algorithm, <literal>gzip</literal> compression will
@@ -428,12 +431,13 @@ PostgreSQL documentation
used if the level is 0.
</para>
<para>
- When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. When the plain format is
- used, client-side compression may not be specified, but it is
- still possible to request server-side compression. If this is done,
- the server will compress the backup for transmission, and the
- client will decompress and extract it.
+ When the tar format is used with <literal>gzip</literal> or
+ <literal>lz4</literal>, the suffix <filename>.gz</filename> or
+ <filename>.lz4</filename> will automatically be added to all tar
+ filenames. When the plain format is used, client-side compression may
+ not be specified, but it is still possible to request server-side
+ compression. If this is done, the server will compress the backup for
+ transmission, and the client will decompress and extract it.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 10ce2406c0..d54f30d8b4 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -904,6 +905,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_NONE;
else if (strcmp(optval, "gzip") == 0)
opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1028,6 +1031,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..2a169d2e67
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,298 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_lz4_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 12);
+
+ if (compresslevel < 0 || compresslevel > 12)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->compressionLevel = mysink->compresslevel;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written += headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed by
+ * LZ4F_compressBound(), ask the next sink to process the data so that we
+ * can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written += compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written += compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index c40925c1f0..923659ddee 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -1003,6 +1003,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
+ {
+ *methodres = COMPRESSION_LZ4;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1930,6 +1935,9 @@ BaseBackup(void)
case COMPRESSION_GZIP:
compressmethodstr = "gzip";
break;
+ case COMPRESSION_LZ4:
+ compressmethodstr = "lz4";
+ break;
default:
Assert(false);
break;
@@ -2772,8 +2780,12 @@ main(int argc, char **argv)
}
break;
case COMPRESSION_LZ4:
- /* option not supported */
- Assert(false);
+ if (compresslevel > 12)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 12",
+ compresslevel, "lz4");
+ exit(1);
+ }
break;
}
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 1ae818f9a1..851233a6e0 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -9,6 +9,7 @@ export TAR
# used by the command "gzip" to pass down options, so stick with a different
# name.
export GZIP_PROGRAM=$(GZIP)
+export LZ4=$(LZ4)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index d32c86e92e..33fd22a188
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 6;
+use Test::More tests => 9;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -35,6 +35,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'GZIP_PROGRAM'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'server-lz4'],
+ 'backup_archive' => 'base.tar.lz4',
+ 'decompress_program' => $ENV{'LZ4'},
+ 'decompress_flags' => [ '-d' ],
+ 'output_file' => 'base.tar',
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
}
);
@@ -74,6 +83,8 @@ for my $tc (@test_configuration)
push @decompress, @{$tc->{'decompress_flags'}}
if $tc->{'decompress_flags'};
push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ push @decompress, $backup_path . '/' . $tc->{'output_file'}
+ if $tc->{'output_file'};
system_or_bail(@decompress);
}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index d3276b2487..a8db8e9a0e 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_copytblspc_new(void);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On Tue, Jan 18, 2022 at 1:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
0001 adds "server" and "blackhole" as backup targets. It now has some
tests. This might be more or less ready to ship, unless somebody else
sees a problem, or I find one.
I played around with this a bit and it seems quite easy to extend this
further. So please find attached a couple more patches to generalize
this mechanism.
0001 adds an extensibility framework for backup targets. The idea is
that an extension loaded via shared_preload_libraries can call
BaseBackupAddTarget() to define a new base backup target, which the
user can then access via pg_basebackup --target TARGET_NAME, or if
they want to pass a detail string, pg_basebackup --target
TARGET_NAME:DETAIL. There might be slightly better ways of hooking
this into the system. I'm not unhappy with this approach, but there
might be a better idea out there.
0002 adds an example contrib module called basebackup_to_shell. The
system administrator can set basebackup_to_shell.command='SOMETHING'.
A backup directed to the 'shell' target will cause the server to
execute the configured command once per generated archive, and once
for the backup_manifest, if any. When executing the command, %f gets
replaced with the archive filename (e.g. base.tar) and %d gets
replaced with the detail. The actual contents of the file are passed
to the command's standard input, and it can then do whatever it likes
with that data. Clearly, this is not state of the art; for instance,
if what you really want is to upload the backup files someplace via
HTTP, using this to run 'curl' is probably not so good of an idea as
using an extension module that links with libcurl. That would likely
lead to better error checking, better performance, nicer
configuration, and just generally fewer things that can go wrong. On
the other hand, writing an integration in C is kind of tricky, and
this thing is quite easy to use -- and it does work.
There are a couple of things to be concerned about with 0002 from a
security perspective. First, in a backend environment, we have a
function to spawn a subprocess via popen(), namely OpenPipeStream(),
but there is no function to spawn a subprocess with execve() and end
up with a socket connected to its standard input. And that means that
whatever command the administrator configures is being interpreted by
the shell, which is a potential problem given that we're interpolating
the target detail string supplied by the user, who must have at least
replication privileges but need not be the superuser. I chose to
handle this by allowing the target detail to contain only alphanumeric
characters. Refinement is likely possible, but whether the effort is
worthwhile seems questionable. Second, what if the superuser wants to
allow the use of this module to only some of the users who have
replication privileges? That seems a bit unlikely but it's possible,
so I added a GUC basebackup_to_shell.required_role. If set, the
functionality is only usable by members of the named role. If unset,
anyone with replication privilege can use it. I guess someone could
criticize this as defaulting to the least secure setting, but
considering that you have to have replication privileges to use this
at all, I don't find that argument much to get excited about.
I have to say that I'm incredibly happy with how easy these patches
were to write. I think this is going to make adding new base backup
targets as accessible as we can realistically hope to make it. There
is some boilerplate code, as an examination of the patches will
reveal, but it's not a lot, and at least IMHO it's pretty
straightforward. Granted, coding up a new base backup target is
something only experienced C hackers are likely to do, but the fact
that I was able to throw this together so quickly suggests to me that
I've got the design basically right, and that anyone who does want to
plug into the new mechanism shouldn't have too much trouble doing so.
Thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
0001-Allow-extensions-to-add-new-backup-targets.patchapplication/octet-stream; name=0001-Allow-extensions-to-add-new-backup-targets.patchDownload
From bb3896b115a964960a8b87063a1ac4ac5d7596bf Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 2 Feb 2022 09:50:34 -0500
Subject: [PATCH 1/2] Allow extensions to add new backup targets.
Commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc allowed for base backup
targets, meaning that we could do something with the backup other than
send it to the client, but all of those targets had to be baked in to
the core code. This commit makes it possible for extensions to define
additional backup targets.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 93 +++-----
src/backend/replication/basebackup_target.c | 238 ++++++++++++++++++++
src/include/replication/basebackup_target.h | 66 ++++++
4 files changed, 340 insertions(+), 58 deletions(-)
create mode 100644 src/backend/replication/basebackup_target.c
create mode 100644 src/include/replication/basebackup_target.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..847c87214f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -22,6 +22,7 @@ OBJS = \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
+ basebackup_target.o \
basebackup_throttle.o \
repl_gram.o \
slot.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 10ce2406c0..7d4af84cf1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -28,6 +28,7 @@
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "replication/basebackup_target.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -53,14 +54,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_TARGET_BLACKHOLE,
- BACKUP_TARGET_COMPAT,
- BACKUP_TARGET_CLIENT,
- BACKUP_TARGET_SERVER
-} backup_target_type;
-
typedef enum
{
BACKUP_COMPRESSION_NONE,
@@ -76,8 +69,9 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
- backup_target_type target;
- char *target_detail;
+ bool send_to_client;
+ bool use_copytblspc;
+ BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
basebackup_compression_type compression;
int compression_level;
@@ -714,12 +708,12 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest_checksums = false;
bool o_target = false;
bool o_target_detail = false;
- char *target_str = "compat"; /* placate compiler */
+ char *target_str = NULL;
+ char *target_detail_str = NULL;
bool o_compression = false;
bool o_compression_level = false;
MemSet(opt, 0, sizeof(*opt));
- opt->target = BACKUP_TARGET_COMPAT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
@@ -863,22 +857,11 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- target_str = defGetString(defel);
-
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(target_str, "blackhole") == 0)
- opt->target = BACKUP_TARGET_BLACKHOLE;
- else if (strcmp(target_str, "client") == 0)
- opt->target = BACKUP_TARGET_CLIENT;
- else if (strcmp(target_str, "server") == 0)
- opt->target = BACKUP_TARGET_SERVER;
- else
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", target_str)));
+ target_str = defGetString(defel);
o_target = true;
}
else if (strcmp(defel->defname, "target_detail") == 0)
@@ -889,7 +872,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->target_detail = optval;
+ target_detail_str = optval;
o_target_detail = true;
}
else if (strcmp(defel->defname, "compression") == 0)
@@ -937,22 +920,28 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
- if (opt->target == BACKUP_TARGET_SERVER)
+
+ if (target_str == NULL)
{
- if (opt->target_detail == NULL)
+ if (target_detail_str != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("target '%s' requires a target detail",
- target_str)));
+ errmsg("target detail cannot be used without target")));
+ opt->use_copytblspc = true;
+ opt->send_to_client = true;
}
- else
+ else if (strcmp(target_str, "client") == 0)
{
- if (opt->target_detail != NULL)
+ if (target_detail_str != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("target '%s' does not accept a target detail",
target_str)));
+ opt->send_to_client = true;
}
+ else
+ opt->target_handle =
+ BaseBackupGetTargetHandle(target_str, target_detail_str);
if (o_compression_level && !o_compression)
ereport(ERROR,
@@ -988,37 +977,25 @@ SendBaseBackup(BaseBackupCmd *cmd)
}
/*
- * If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If the target is specifically 'client' then set up to stream
- * the backup to the client; otherwise, it's being sent someplace else and
- * should not be sent to the client.
+ * There are basically three cases here.
+ *
+ * If the TARGET option was not specified, we have to fall back to the
+ * older and less capable copy-tablespace protocol.
+ *
+ * If the TARGET option was set to 'client', we'll use the new copy-stream
+ * protocol and send the backup to the client.
*
- * If the TARGET option was not specified, we must fall back to the older
- * and less capable copy-tablespace protocol.
+ * If the TARGET option was set to anything else, we'll use the new
+ * copy-stream protocol to send progress updates to the client, and
+ * BaseBackupGetSink will arrange to dispose of the backup data.
*/
- if (opt.target == BACKUP_TARGET_CLIENT)
- sink = bbsink_copystream_new(true);
- else if (opt.target != BACKUP_TARGET_COMPAT)
- sink = bbsink_copystream_new(false);
- else
+ if (opt.use_copytblspc)
sink = bbsink_copytblspc_new();
-
- /*
- * If a non-default backup target is in use, arrange to send the data
- * wherever it needs to go.
- */
- switch (opt.target)
+ else
{
- case BACKUP_TARGET_BLACKHOLE:
- /* Nothing to do, just discard data. */
- break;
- case BACKUP_TARGET_COMPAT:
- case BACKUP_TARGET_CLIENT:
- /* Nothing to do, handling above is sufficient. */
- break;
- case BACKUP_TARGET_SERVER:
- sink = bbsink_server_new(sink, opt.target_detail);
- break;
+ sink = bbsink_copystream_new(opt.send_to_client);
+ if (opt.target_handle != NULL)
+ sink = BaseBackupGetSink(opt.target_handle, sink);
}
/* Set up network throttling, if client requested it */
diff --git a/src/backend/replication/basebackup_target.c b/src/backend/replication/basebackup_target.c
new file mode 100644
index 0000000000..d93f5e02db
--- /dev/null
+++ b/src/backend/replication/basebackup_target.c
@@ -0,0 +1,238 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_target.c
+ * Base backups can be "targetted," which means that they can be sent
+ * somewhere other than to the client which requested the backup.
+ * Furthermore, new targets can be defined by extensions. This file
+ * contains code to support that functionality.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "replication/basebackup_target.h"
+#include "utils/memutils.h"
+
+typedef struct BaseBackupTargetType
+{
+ char *name;
+ void *(*check_detail) (char *, char *);
+ bbsink *(*get_sink) (bbsink *, void *);
+} BaseBackupTargetType;
+
+struct BaseBackupTargetHandle
+{
+ BaseBackupTargetType *type;
+ void *detail_arg;
+};
+
+static void initialize_target_list(void);
+extern bbsink *blackhole_get_sink(bbsink *next_sink, void *detail_arg);
+extern bbsink *server_get_sink(bbsink *next_sink, void *detail_arg);
+static void *reject_target_detail(char *target, char *target_detail);
+static void *server_check_detail(char *target, char *target_detail);
+
+static BaseBackupTargetType builtin_backup_targets[] =
+{
+ {
+ "blackhole", reject_target_detail, blackhole_get_sink
+ },
+ {
+ "server", server_check_detail, server_get_sink
+ },
+ {
+ NULL
+ }
+};
+
+static List *BaseBackupTargetTypeList = NIL;
+
+/*
+ * Add a new base backup target type.
+ *
+ * This is intended for use by server extensions.
+ */
+void
+BaseBackupAddTarget(char *name,
+ void *(*check_detail) (char *, char *),
+ bbsink *(*get_sink) (bbsink *, void *))
+{
+ BaseBackupTargetType *ttype;
+ MemoryContext oldcontext;
+ ListCell *lc;
+
+ /* If the target list is not yet initialized, do that first. */
+ if (BaseBackupTargetTypeList == NIL)
+ initialize_target_list();
+
+ /* Search the target type list for an existing entry with this name. */
+ foreach(lc, BaseBackupTargetTypeList)
+ {
+ BaseBackupTargetType *ttype = lfirst(lc);
+
+ if (strcmp(ttype->name, name) == 0)
+ {
+ /*
+ * We found one, so update it.
+ *
+ * It is probably not a great idea to call BaseBackupAddTarget
+ * for the same name multiple times, but if it happens, this
+ * seems like the sanest behavior.
+ */
+ ttype->check_detail = check_detail;
+ ttype->get_sink = get_sink;
+ return;
+ }
+ }
+
+ /*
+ * We use TopMemoryContext for allocations here to make sure that the
+ * data we need doesn't vanish under us; that's also why we copy the
+ * target name into a newly-allocated chunk of memory.
+ */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ ttype = palloc(sizeof(BaseBackupTargetType));
+ ttype->name = pstrdup(name);
+ ttype->check_detail = check_detail;
+ ttype->get_sink = get_sink;
+ BaseBackupTargetTypeList = lappend(BaseBackupTargetTypeList, ttype);
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Look up a base backup target and validate the target_detail.
+ *
+ * Extensions that define new backup targets will probably define a new
+ * type of bbsink to match. Validation of the target_detail can be performed
+ * either in the check_detail routine called here, or in the bbsink
+ * constructor, which will be called from BaseBackupGetSink. It's mostly
+ * a matter of taste, but the check_detail function runs somewhat earlier.
+ */
+BaseBackupTargetHandle *
+BaseBackupGetTargetHandle(char *target, char *target_detail)
+{
+ ListCell *lc;
+
+ /* If the target list is not yet initialized, do that first. */
+ if (BaseBackupTargetTypeList == NIL)
+ initialize_target_list();
+
+ /* Search the target type list for a match. */
+ foreach(lc, BaseBackupTargetTypeList)
+ {
+ BaseBackupTargetType *ttype = lfirst(lc);
+
+ if (strcmp(ttype->name, target) == 0)
+ {
+ BaseBackupTargetHandle *handle;
+
+ /* Found the target. */
+ handle = palloc(sizeof(BaseBackupTargetHandle));
+ handle->type = ttype;
+ handle->detail_arg = ttype->check_detail(target, target_detail);
+
+ return handle;
+ }
+ }
+
+ /* Did not find the target. */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unrecognized target: \"%s\"", target)));
+}
+
+/*
+ * Construct a bbsink that will implement the backup target.
+ *
+ * The get_sink function does all the real work, so all we have to do here
+ * is call it with the correct arguments. Whatever the check_detail function
+ * returned is here passed through to the get_sink function. This lets those
+ * two functions communicate with each other, if they wish. If not, the
+ * check_detail function can simply return the target_detail and let the
+ * get_sink function take it from there.
+ */
+bbsink *
+BaseBackupGetSink(BaseBackupTargetHandle *handle, bbsink *next_sink)
+{
+ return handle->type->get_sink(next_sink, handle->detail_arg);
+}
+
+/*
+ * Load predefined target types into BaseBackupTargetTypeList.
+ */
+static void
+initialize_target_list(void)
+{
+ BaseBackupTargetType *ttype = builtin_backup_targets;
+ MemoryContext oldcontext;
+
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ while (ttype->name != NULL)
+ {
+ BaseBackupTargetTypeList = lappend(BaseBackupTargetTypeList, ttype);
+ ++ttype;
+ }
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Normally, a get_sink function should construct and return a new bbsink that
+ * implements the backup target, but the 'blackhole' target just throws the
+ * data away. We could implement that by adding a bbsink that does nothing
+ * but forward, but it's even cheaper to implement that by not adding a bbsink
+ * at all.
+ */
+bbsink *
+blackhole_get_sink(bbsink *next_sink, void *detail_arg)
+{
+ return next_sink;
+}
+
+/*
+ * Create a bbsink implementing a server-side backup.
+ */
+bbsink *
+server_get_sink(bbsink *next_sink, void *detail_arg)
+{
+ return bbsink_server_new(next_sink, detail_arg);
+}
+
+/*
+ * Implement target-detail checking for a target that does not accept a
+ * detail.
+ */
+void *
+reject_target_detail(char *target, char *target_detail)
+{
+ if (target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target)));
+
+ return NULL;
+}
+
+/*
+ * Implement target-detail checking for a server-side backup.
+ *
+ * target_detail should be the name of the directory to which the backup
+ * should be written, but we don't check that here. Rather, that check,
+ * as well as the necessary permissions checking, happens in bbsink_server_new.
+ */
+void *
+server_check_detail(char *target, char *target_detail)
+{
+ if (target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target)));
+
+ return target_detail;
+}
diff --git a/src/include/replication/basebackup_target.h b/src/include/replication/basebackup_target.h
new file mode 100644
index 0000000000..e23ac29a89
--- /dev/null
+++ b/src/include/replication/basebackup_target.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_target.h
+ * Extensibility framework for adding base backup targets.
+ *
+ * Portions Copyright (c) 2010-2022, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_target.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_TARGET_H
+#define BASEBACKUP_TARGET_H
+
+#include "replication/basebackup_sink.h"
+
+struct BaseBackupTargetHandle;
+typedef struct BaseBackupTargetHandle BaseBackupTargetHandle;
+
+/*
+ * Extensions can call this function to create new backup targets.
+ *
+ * 'name' is the name of the new target.
+ *
+ * 'check_detail' is a function that accepts a target name and target detail
+ * and either throws an error (if the target detail is not valid or some other
+ * problem, such as a permissions issue, is detected) or returns a pointer to
+ * the data that will be needed to create a bbsink implementing that target.
+ * The second argumnt will be NULL if the TARGET_DETAIL option to the
+ * BASE_BACKUP command was not specified.
+ *
+ * 'get_sink' is a function that creates the bbsink. The first argument
+ * is the successor sink; the sink created by this function should always
+ * forward to this sink. The second argument is the pointer returned by a
+ * previous call to the 'check_detail' function.
+ *
+ * In practice, a user will type something like "pg_basebackup --target foo:bar
+ * -Xfetch". That will cause the server to look for a backup target named
+ * "foo". If one is found, the check_detail callback will be invoked for the
+ * string "bar", and whatever that callback returns will be passed as the
+ * second argument to the get_sink callback.
+ */
+extern void BaseBackupAddTarget(char *name,
+ void *(*check_detail) (char *, char *),
+ bbsink * (*get_sink) (bbsink *, void *));
+
+/*
+ * These functions are used by the core code to access base backup targets
+ * added via BaseBackupAddTarget(). The core code will pass the TARGET and
+ * TARGET_DETAIL strings obtained from the user to BaseBackupGetTargetHandle,
+ * which will either throw an error (if the TARGET is not recognized or the
+ * check_detail hook for that TARGET doesn't like the TARGET_DETAIL) or
+ * return a BaseBackupTargetHandle object that can later be passed to
+ * BaseBackupGetSink.
+ *
+ * BaseBackupGetSink constructs a bbsink implementing the desired target
+ * using the BaseBackupTargetHandle and the successor bbsink. It does this
+ * by arranging to call the get_sink() callback provided by the extension
+ * that implements the base backup target.
+ */
+extern BaseBackupTargetHandle *BaseBackupGetTargetHandle(char *target,
+ char *target_detail);
+extern bbsink *BaseBackupGetSink(BaseBackupTargetHandle *handle,
+ bbsink *next_sink);
+
+#endif
--
2.24.3 (Apple Git-128)
0002-Add-basebackup_to_shell-contrib-module.patchapplication/octet-stream; name=0002-Add-basebackup_to_shell-contrib-module.patchDownload
From c2bb350e46b624689a79244310b20d8a8914b3ca Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 2 Feb 2022 09:50:41 -0500
Subject: [PATCH 2/2] Add 'basebackup_to_shell' contrib module.
As a demonstration of the sort of thing that can be done by adding a
custom backup target, this defines a 'shell' target which executes a
command defined by the system administrator. The command is executed
once for each tar archive generate by the backup and once for the
backup manifest, if any. Each time the command is executed, it
receives the contents of th file for which it is executed via standard
input.
The configured command can use %f to refer to the name of the archive
(e.g. base.tar, $TABLESPACE_OID.tar, backup_manifest) and %d to refer
to the target detail (pg_basebackup --target shell:DETAIL). A target
detail is required if %d appears in the configured command and
forbidden if it does not.
---
contrib/Makefile | 1 +
contrib/basebackup_to_shell/Makefile | 19 +
.../basebackup_to_shell/basebackup_to_shell.c | 419 ++++++++++++++++++
3 files changed, 439 insertions(+)
create mode 100644 contrib/basebackup_to_shell/Makefile
create mode 100644 contrib/basebackup_to_shell/basebackup_to_shell.c
diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..2e6df041c9 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -9,6 +9,7 @@ SUBDIRS = \
amcheck \
auth_delay \
auto_explain \
+ basebackup_to_shell \
bloom \
btree_gin \
btree_gist \
diff --git a/contrib/basebackup_to_shell/Makefile b/contrib/basebackup_to_shell/Makefile
new file mode 100644
index 0000000000..f31dfaae9c
--- /dev/null
+++ b/contrib/basebackup_to_shell/Makefile
@@ -0,0 +1,19 @@
+# contrib/basebackup_to_shell/Makefile
+
+MODULE_big = basebackup_to_shell
+OBJS = \
+ $(WIN32RES) \
+ basebackup_to_shell.o
+
+PGFILEDESC = "basebackup_to_shell - target basebackup to shell command"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basebackup_to_shell
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basebackup_to_shell/basebackup_to_shell.c b/contrib/basebackup_to_shell/basebackup_to_shell.c
new file mode 100644
index 0000000000..d82cb6d13f
--- /dev/null
+++ b/contrib/basebackup_to_shell/basebackup_to_shell.c
@@ -0,0 +1,419 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_to_shell.c
+ * target base backup files to a shell command
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * contrib/basebackup_to_shell/basebackup_to_shell.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "miscadmin.h"
+#include "replication/basebackup_target.h"
+#include "storage/fd.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+typedef struct bbsink_shell
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* User-supplied target detail string. */
+ char *target_detail;
+
+ /* Shell command pattern being used for this backup. */
+ char *shell_command;
+
+ /* The command that is currently running. */
+ char *current_command;
+
+ /* Pipe to the running command. */
+ FILE *pipe;
+} bbsink_shell;
+
+void _PG_init(void);
+
+static void *shell_check_detail(char *target, char *target_detail);
+static bbsink *shell_get_sink(bbsink *next_sink, void *detail_arg);
+
+static void bbsink_shell_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_shell_archive_contents(bbsink *sink, size_t len);
+static void bbsink_shell_end_archive(bbsink *sink);
+static void bbsink_shell_begin_manifest(bbsink *sink);
+static void bbsink_shell_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_shell_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_shell_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_shell_begin_archive,
+ .archive_contents = bbsink_shell_archive_contents,
+ .end_archive = bbsink_shell_end_archive,
+ .begin_manifest = bbsink_shell_begin_manifest,
+ .manifest_contents = bbsink_shell_manifest_contents,
+ .end_manifest = bbsink_shell_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+static char *shell_command = "";
+static char *shell_required_role = "";
+
+void
+_PG_init(void)
+{
+ DefineCustomStringVariable("basebackup_to_shell.command",
+ "Shell command to be executed for each backup file.",
+ NULL,
+ &shell_command,
+ "",
+ PGC_SIGHUP,
+ 0,
+ NULL, NULL, NULL);
+
+ DefineCustomStringVariable("basebackup_to_shell.required_role",
+ "Backup user must be a member of this role to use shell backup target.",
+ NULL,
+ &shell_required_role,
+ "",
+ PGC_SIGHUP,
+ 0,
+ NULL, NULL, NULL);
+
+ BaseBackupAddTarget("shell", shell_check_detail, shell_get_sink);
+}
+
+/*
+ * We choose to defer sanity sanity checking until shell_get_sink(), and so
+ * just pass the target detail through without doing anything. However, we do
+ * permissions checks here, before any real work has been done.
+ */
+static void *
+shell_check_detail(char *target, char *target_detail)
+{
+ if (shell_required_role[0] != '\0')
+ {
+ Oid roleid;
+
+ StartTransactionCommand();
+ roleid = get_role_oid(shell_required_role, true);
+ if (!is_member_of_role(GetUserId(), roleid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to use basebackup_to_shell")));
+ CommitTransactionCommand();
+ }
+
+ return target_detail;
+}
+
+/*
+ * Set up a bbsink to implement this base backup target.
+ *
+ * This is also a convenient place to sanity check that a target detail was
+ * given if and only if %d is present.
+ */
+static bbsink *
+shell_get_sink(bbsink *next_sink, void *detail_arg)
+{
+ bbsink_shell *sink;
+ bool has_detail_escape = false;
+ char *c;
+
+ /*
+ * Set up the bbsink.
+ *
+ * We remember the current value of basebackup_to_shell.shell_command to
+ * be certain that it can't change under us during the backup.
+ */
+ sink = palloc0(sizeof(bbsink_shell));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_shell_ops;
+ sink->base.bbs_next = next_sink;
+ sink->target_detail = detail_arg;
+ sink->shell_command = pstrdup(shell_command);
+
+ /* Reject an empty shell command. */
+ if (sink->shell_command[0] == '\0')
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("shell command for backup is not configured"));
+
+ /* Determine whether the shell command we're using contains %d. */
+ for (c = sink->shell_command; *c != '\0'; ++c)
+ {
+ if (c[0] == '%' && c[1] != '\0')
+ {
+ if (c[1] == 'd')
+ has_detail_escape = true;
+ ++c;
+ }
+ }
+
+ /* There should be a target detail if %d was used, and not otherwise. */
+ if (has_detail_escape && sink->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a target detail is required because the configured command includes %%d"),
+ errhint("Try \"pg_basebackup --target shell:DETAIL ...\"")));
+ else if (!has_detail_escape && sink->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a target detail is not permitted because the configured command does not include %%d")));
+
+ /*
+ * Since we're passing the string provided by the user to popen(), it will
+ * be interpreted by the shell, which is a potential security
+ * vulnerability, since the user invoking this module is not necessarily
+ * a superuser. To stay out of trouble, we must disallow any shell
+ * metacharacters here; to be conservative and keep things simple, we
+ * allow only alphanumerics.
+ */
+ if (sink->target_detail != NULL)
+ {
+ char *d;
+ bool scary = false;
+
+ for (d = sink->target_detail; *d != '\0'; ++d)
+ {
+ if (*d >= 'a' && *d <= 'z')
+ continue;
+ if (*d >= 'A' && *d <= 'Z')
+ continue;
+ if (*d >= '0' && *d <= '9')
+ continue;
+ scary = true;
+ break;
+ }
+
+ if (scary)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("target detail must contain only alphanumeric characters"));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Construct the exact shell command that we're actually going to run,
+ * making substitutions as appropriate for escape sequences.
+ */
+static char *
+shell_construct_command(char *base_command, const char *filename,
+ char *target_detail)
+{
+ StringInfoData buf;
+ char *c;
+
+ initStringInfo(&buf);
+ for (c = base_command; *c != '\0'; ++c)
+ {
+ /* Anything other than '%' is copied verbatim. */
+ if (*c != '%')
+ {
+ appendStringInfoChar(&buf, *c);
+ continue;
+ }
+
+ /* Any time we see '%' we eat the following character as well. */
+ ++c;
+
+ /*
+ * The following character determines what we insert here, or may
+ * cause us to throw an error.
+ */
+ if (*c == '%')
+ {
+ /* '%%' is replaced by a single '%' */
+ appendStringInfoChar(&buf, '%');
+ }
+ else if (*c == 'f')
+ {
+ /* '%f' is replaced by the filename */
+ appendStringInfoString(&buf, filename);
+ }
+ else if (*c == 'd')
+ {
+ /* '%d' is replaced by the target detail */
+ appendStringInfoString(&buf, target_detail);
+ }
+ else if (*c == '\0')
+ {
+ /* Incomplete escape sequence, expected a character afterward */
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("shell command ends unexpectedly after escape character \"%%\""));
+ }
+ else
+ {
+ /* Unknown escape sequence */
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("shell command contains unexpected escape sequence \"%c\"",
+ *c));
+ }
+ }
+
+ return buf.data;
+}
+
+/*
+ * Finish executing the shell command once all data has been written.
+ */
+static void
+shell_finish_command(bbsink_shell *sink)
+{
+ int pclose_rc;
+
+ /* There should be a command running. */
+ Assert(sink->current_command != NULL);
+ Assert(sink->pipe != NULL);
+
+ /* Close down the pipe we opened. */
+ pclose_rc = ClosePipeStream(sink->pipe);
+ if (pclose_rc == -1)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not close pipe to external command: %m")));
+ else if (pclose_rc != 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+ errmsg("shell command \"%s\" failed",
+ sink->current_command),
+ errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+ }
+
+ /* Clean up. */
+ sink->pipe = NULL;
+ pfree(sink->current_command);
+ sink->current_command = NULL;
+}
+
+/*
+ * Start up the shell command, substituting %f in for the current filename.
+ */
+static void
+shell_run_command(bbsink_shell *sink, const char *filename)
+{
+ /* There should not be anything already running. */
+ Assert(sink->current_command == NULL);
+ Assert(sink->pipe == NULL);
+
+ /* Construct a suitable command. */
+ sink->current_command = shell_construct_command(sink->shell_command,
+ filename,
+ sink->target_detail);
+
+ /* Run it. */
+ sink->pipe = OpenPipeStream(sink->current_command, PG_BINARY_W);
+}
+
+/*
+ * Send accumulated data to the running shell command.
+ */
+static void
+shell_send_data(bbsink_shell *sink, size_t len)
+{
+ /* There should be a command running. */
+ Assert(sink->current_command != NULL);
+ Assert(sink->pipe != NULL);
+
+ /* Try to write the data. */
+ if (fwrite(sink->base.bbs_buffer, len, 1, sink->pipe) != 1 ||
+ ferror(sink->pipe))
+ {
+ if (errno == EPIPE)
+ {
+ /*
+ * The error we're about to throw would shut down the command
+ * anyway, but we may get a more meaningful error message by
+ * doing this. If not, we'll fall through to the generic error
+ * below.
+ */
+ shell_finish_command(sink);
+ errno = EPIPE;
+ }
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to shell backup program: %m")));
+ }
+}
+
+/*
+ * At start of archive, start up the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_run_command(mysink, archive_name);
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Send archive contents to command's stdin and forward to next sink.
+ */
+static void
+bbsink_shell_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_send_data(mysink, len);
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * At end of archive, shut down the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_end_archive(bbsink *sink)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_finish_command(mysink);
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * At start of manifest, start up the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_begin_manifest(bbsink *sink)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_run_command(mysink, "backup_manifest");
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Send manifest contents to command's stdin and forward to next sink.
+ */
+static void
+bbsink_shell_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_send_data(mysink, len);
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * At end of manifest, shut down the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_end_manifest(bbsink *sink)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_finish_command(mysink);
+ bbsink_forward_end_manifest(sink);
+}
--
2.24.3 (Apple Git-128)
At 2022-02-02 10:55:53 -0500, robertmhaas@gmail.com wrote:
On Tue, Jan 18, 2022 at 1:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
0001 adds "server" and "blackhole" as backup targets. It now has some
tests. This might be more or less ready to ship, unless somebody else
sees a problem, or I find one.I played around with this a bit and it seems quite easy to extend this
further. So please find attached a couple more patches to generalize
this mechanism.
It took me a while to assimilate these patches, including the backup
targets one, which I hadn't looked at before. Now that I've wrapped my
head around how to put the pieces together, I really like the idea. As
you say, writing non-trivial integrations in C will take some effort,
but it seems worthwhile. It's also nice that one can continue to use
pg_basebackup to trigger the backups and see progress information.
Granted, coding up a new base backup target is
something only experienced C hackers are likely to do, but the fact
that I was able to throw this together so quickly suggests to me that
I've got the design basically right, and that anyone who does want to
plug into the new mechanism shouldn't have too much trouble doing so.Thoughts?
Yes, it looks simple to follow the example set by basebackup_to_shell to
write a custom target. The complexity will be in whatever we need to do
to store/forward the backup data, rather than in obtaining the data in
the first place, which is exactly as it should be.
Thanks!
-- Abhijit
Hi,
On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
jeevan.ladhe@enterprisedb.com> wrote:
Hi Robert,
I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.
Please find the attached patch to support client side compression
and decompression using lz4.
Added a new lz4 bbstreamer to compress the archive chunks at
client if user has specified --compress=clinet-lz4:[LEVEL] option
in pg_basebackup. The new streamer accepts archive chunks
compresses it and forwards it to plain-writer.
Similarly, If a user has specified a server compressed lz4 archive
with plain format (-F p) backup then it requires decompressing
the compressed archive chunks before forwarding it to tar extractor.
Added a new bbstreamer to decompress the compressed archive
and forward it to tar extractor.
Note: This patch can be applied on Jeevan Ladhe's v12 patch
for lz4 compression.
Thanks,
Dipesh
Attachments:
v1-0001-support-client-side-compression-and-decompression-us.patchtext/x-patch; charset=US-ASCII; name=v1-0001-support-client-side-compression-and-decompression-us.patchDownload
From 67e47579e119897c66e6f5f7a5e5e9542399072f Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Thu, 3 Feb 2022 18:31:03 +0530
Subject: [PATCH] support client side compression and decompression using LZ4
---
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 3 +
src/bin/pg_basebackup/bbstreamer_lz4.c | 436 ++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 32 +-
src/bin/pg_verifybackup/t/009_extract.pl | 7 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++++++
src/tools/msvc/Mkvcbuild.pm | 1 +
7 files changed, 585 insertions(+), 6 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c
create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index ada3a5a..1d0db4f 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -43,6 +43,7 @@ BBOBJS = \
bbstreamer_file.o \
bbstreamer_gzip.o \
bbstreamer_inject.o \
+ bbstreamer_lz4.o \
bbstreamer_tar.o
all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fe49ae3..c2de77b 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
new file mode 100644
index 0000000..9055a23
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -0,0 +1,436 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_lz4.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_lz4.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBLZ4
+typedef struct bbstreamer_lz4_frame
+{
+ bbstreamer base;
+
+ LZ4F_compressionContext_t cctx;
+ LZ4F_decompressionContext_t dctx;
+ LZ4F_preferences_t prefs;
+
+ size_t bytes_written;
+ bool header_written;
+} bbstreamer_lz4_frame;
+
+static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
+ .content = bbstreamer_lz4_compressor_content,
+ .finalize = bbstreamer_lz4_compressor_finalize,
+ .free = bbstreamer_lz4_compressor_free
+};
+
+static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
+ .content = bbstreamer_lz4_decompressor_content,
+ .finalize = bbstreamer_lz4_decompressor_finalize,
+ .free = bbstreamer_lz4_decompressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs lz4 compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBLZ4
+ bbstreamer_lz4_frame *streamer;
+ LZ4F_errorCode_t ctxError;
+ LZ4F_preferences_t *prefs;
+ size_t compressed_bound;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_lz4_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_lz4_compressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->header_written = false;
+
+ /* Initialize stream compression preferences */
+ prefs = &streamer->prefs;
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->compressionLevel = compresslevel;
+
+ /*
+ * Find out the compression bound, it specifies the minimum destination
+ * capacity required in worst case for the success of compression operation
+ * (LZ4F_compressUpdate) based on a given source size and preferences.
+ */
+ compressed_bound = LZ4F_compressBound(streamer->base.bbs_buffer.maxlen, prefs);
+
+ /* Align the output buffer length. */
+ compressed_bound += compressed_bound + BLCKSZ - (compressed_bound % BLCKSZ);
+
+ /* Enlarge buffer if it falls short of compression bound. */
+ if (streamer->base.bbs_buffer.maxlen <= compressed_bound)
+ enlargeStringInfo(&streamer->base.bbs_buffer, compressed_bound);
+
+ ctxError = LZ4F_createCompressionContext(&streamer->cctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ pg_log_error("could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+/*
+ * Compress the input data to output buffer.
+ *
+ * Find out the compression bound based on input data length for each
+ * invocation to make sure that output buffer has enough capacity to
+ * accommodate the compressed data. In case if the output buffer
+ * capacity falls short of compression bound then forward the content
+ * of output buffer to next streamer and empty the buffer.
+ */
+static void
+bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_lz4_frame *mystreamer;
+ uint8 *next_in,
+ *next_out;
+ size_t out_bound,
+ compressed_size,
+ avail_in,
+ avail_out;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ next_in = (uint8 *) data;
+ avail_in = len;
+
+ /* Write header before processing the first input chunk. */
+ if (!mystreamer->header_written)
+ {
+ compressed_size = LZ4F_compressBegin(mystreamer->cctx,
+ (uint8 *) mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ &mystreamer->prefs);
+
+ if (LZ4F_isError(compressed_size))
+ pg_log_error("could not write lz4 header: %s",
+ LZ4F_getErrorName(compressed_size));
+
+ mystreamer->bytes_written += compressed_size;
+ mystreamer->header_written = true;
+ }
+
+ /*
+ * Update the offset and capacity of output buffer based on based on number
+ * of bytes written to output buffer.
+ */
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * Find out the compression bound and make sure that output buffer has the
+ * required capacity for the success of LZ4F_compressUpdate. If needed
+ * forward the content to next streamer and empty the buffer.
+ */
+ out_bound = LZ4F_compressBound(avail_in, &mystreamer->prefs);
+ Assert(mystreamer->base.bbs_buffer.maxlen >= out_bound);
+ if (avail_out <= out_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
+
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->bytes_written = 0;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ }
+
+ /*
+ * This call compresses the data starting at next_in and generates the
+ * output starting at next_out. It expects the caller to provide the size
+ * of input buffer and capacity of output buffer by providing parameters
+ * avail_in and avail_out.
+ *
+ * It returns the number of bytes compressed to output buffer.
+ */
+ compressed_size = LZ4F_compressUpdate(mystreamer->cctx,
+ next_out, avail_out,
+ next_in, avail_in, NULL);
+
+ if (LZ4F_isError(compressed_size))
+ pg_log_error("could not compress data: %s",
+ LZ4F_getErrorName(compressed_size));
+
+ mystreamer->bytes_written += compressed_size;
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+ uint8 *next_out;
+ size_t footer_bound,
+ compressed_size,
+ avail_out;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+
+ /* Find out the footer bound and update the output buffer. */
+ footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
+ Assert(mystreamer->base.bbs_buffer.maxlen >= footer_bound);
+ if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <=
+ footer_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ BBSTREAMER_UNKNOWN);
+
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->bytes_written = 0;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ }
+ else
+ {
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+ }
+
+ /*
+ * Finalize the frame and flush whatever data remaining in compression
+ * context.
+ */
+ compressed_size = LZ4F_compressEnd(mystreamer->cctx,
+ next_out, avail_out, NULL);
+
+ if (LZ4F_isError(compressed_size))
+ pg_log_error("could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressed_size));
+
+ mystreamer->bytes_written += compressed_size;
+
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ bbstreamer_free(streamer->bbs_next);
+ LZ4F_freeCompressionContext(mystreamer->cctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of lz4
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_lz4_decompressor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBLZ4
+ bbstreamer_lz4_frame *streamer;
+ LZ4F_errorCode_t ctxError;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_lz4_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_lz4_decompressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ ctxError = LZ4F_createDecompressionContext(&streamer->dctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ {
+ pg_log_error("could not initialize compression library: %s",
+ LZ4F_getErrorName(ctxError));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+/*
+ * Decompress the input data to output buffer until we run out of input
+ * data. Each time the output buffer is full, pass on the decompressed data
+ * to the next streamer.
+ */
+static void
+bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_lz4_frame *mystreamer;
+ uint8 *next_in,
+ *next_out;
+ size_t avail_in,
+ avail_out;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ next_in = (uint8 *) data;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ avail_in = len;
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+
+ while (avail_in > 0)
+ {
+ size_t ret,
+ read_size,
+ out_size;
+
+ read_size = avail_in;
+ out_size = avail_out;
+
+ /*
+ * This call decompresses the data starting at next_in and generates
+ * the output data starting at next_out. It expects the caller to
+ * provide size of the input buffer and total capacity of the output
+ * buffer by providing the read_size and out_size parameters
+ * respectively.
+ *
+ * Per the documentation of LZ4, parameters read_size and out_size
+ * behaves as dual parameters. On return, the number of bytes consumed
+ * from the input buffer will be written back to read_size and the
+ * number of bytes decompressed to output buffer will be written back
+ * to out_size respectively.
+ */
+ ret = LZ4F_decompress(mystreamer->dctx,
+ next_out, &out_size,
+ next_in, &read_size, NULL);
+
+ if (LZ4F_isError(ret))
+ pg_log_error("could not decompress data: %s",
+ LZ4F_getErrorName(ret));
+
+ /* Update input buffer based on number of bytes consumed */
+ avail_in -= read_size;
+ next_in += read_size;
+
+ mystreamer->bytes_written += out_size;
+
+ /*
+ * If output buffer is full then forward the content to next streamer and
+ * update the output buffer.
+ */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
+
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->bytes_written = 0;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ }
+ else
+ {
+ avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+ next_out += mystreamer->bytes_written;
+ }
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ bbstreamer_free(streamer->bbs_next);
+ LZ4F_freeDecompressionContext(mystreamer->dctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 923659d..00b2563 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1003,6 +1003,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ {
+ *methodres = COMPRESSION_LZ4;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
{
*methodres = COMPRESSION_LZ4;
@@ -1125,7 +1130,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz;
+ is_tar_gz,
+ is_tar_lz4;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1144,6 +1150,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_gz = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+ /* Is this a LZ4 archive? */
+ is_tar_lz4 = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1153,7 +1163,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz)
+ if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1217,6 +1227,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file,
compresslevel);
}
+ else if (compressmethod == COMPRESSION_LZ4)
+ {
+ strlcat(archive_filename, ".lz4", sizeof(archive_filename));
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
+ streamer = bbstreamer_lz4_compressor_new(streamer,
+ compresslevel);
+ }
else
{
Assert(false); /* not reachable */
@@ -1269,9 +1287,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
- compressloc == COMPRESS_LOCATION_SERVER)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ {
+ if (compressmethod == COMPRESSION_GZIP)
+ streamer = bbstreamer_gzip_decompressor_new(streamer);
+ else if (compressmethod == COMPRESSION_LZ4)
+ streamer = bbstreamer_lz4_decompressor_new(streamer);
+ }
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 51b77e4..9f9a7cc 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 4;
+use Test::More tests => 6;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -27,6 +27,11 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => ['--compress', 'server-gzip:5'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'server-lz4:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
}
);
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
new file mode 100644
index 0000000..34c9b90
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -0,0 +1,111 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test case aims to verify that client-side backup compression work
+# properly, and it also aims to verify that pg_verifybackup can verify a base
+# backup that didn't start out in plain format.
+
+use strict;
+use warnings;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 9;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my $backup_path = $primary->backup_dir . '/client-backup';
+my $extract_path = $primary->backup_dir . '/extracted-backup';
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'backup_archive' => 'base.tar',
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--compress', 'client-gzip:5'],
+ 'backup_archive' => 'base.tar.gz',
+ 'decompress_program' => $ENV{'GZIP_PROGRAM'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'client-lz4:5'],
+ 'backup_archive' => 'base.tar.lz4',
+ 'decompress_program' => $ENV{'LZ4'},
+ 'decompress_flags' => [ '-d' ],
+ 'output_file' => 'base.tar',
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+ skip "no decompressor available for $method", 3
+ if exists $tc->{'decompress_program'} &&
+ !defined $tc->{'decompress_program'};
+
+ # Take a client-side backup.
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path,
+ '-Xfetch', '--no-sync', '-cfast', '-Ft');
+ push @backup, @{$tc->{'backup_flags'}};
+ $primary->command_ok(\@backup,
+ "client side backup, compression $method");
+
+
+ # Verify that the we got the files we expected.
+ my $backup_files = join(',',
+ sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
+ my $expected_backup_files = join(',',
+ sort ('backup_manifest', $tc->{'backup_archive'}));
+ is($backup_files,$expected_backup_files,
+ "found expected backup files, compression $method");
+
+ # Decompress.
+ if (exists $tc->{'decompress_program'})
+ {
+ my @decompress = ($tc->{'decompress_program'});
+ push @decompress, @{$tc->{'decompress_flags'}}
+ if $tc->{'decompress_flags'};
+ push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ push @decompress, $backup_path . '/' . $tc->{'output_file'}
+ if $tc->{'output_file'};
+ system_or_bail(@decompress);
+ }
+
+ SKIP: {
+ my $tar = $ENV{TAR};
+ # don't check for a working tar here, to accomodate various odd
+ # cases such as AIX. If tar doesn't work the init_from_backup below
+ # will fail.
+ skip "no tar program available", 1
+ if (!defined $tar || $tar eq '');
+
+ # Untar.
+ mkdir($extract_path);
+ system_or_bail($tar, 'xf', $backup_path . '/base.tar',
+ '-C', $extract_path);
+
+ # Verify.
+ $primary->command_ok([ 'pg_verifybackup', '-n',
+ '-m', "$backup_path/backup_manifest", '-e', $extract_path ],
+ "verify backup, compression $method");
+ }
+
+ # Cleanup.
+ rmtree($extract_path);
+ rmtree($backup_path);
+ }
+}
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index a310bcb..bab81bd 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -379,6 +379,7 @@ sub mkvcbuild
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_file.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_gzip.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_inject.c');
+ $pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_lz4.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_tar.c');
$pgbasebackup->AddLibrary('ws2_32.lib');
--
1.8.3.1
Thanks for the patch, Dipesh.
With a quick look at the patch I have following observations:
----------------------------------------------------------
In bbstreamer_lz4_compressor_new(), I think this alignment is not needed
on client side:
/* Align the output buffer length. */
compressed_bound += compressed_bound + BLCKSZ - (compressed_bound %
BLCKSZ);
----------------------------------------------------------
bbstreamer_lz4_compressor_content(), avail_in and len variables both are
not changed. I think we can simply change the len to avail_in in the
argument list.
----------------------------------------------------------
Comment:
+ * Update the offset and capacity of output buffer based on based
on number
+ * of bytes written to output buffer.
I think it is thinko:
+ * Update the offset and capacity of output buffer based on number
of
+ * bytes written to output buffer.
----------------------------------------------------------
Indentation:
+ if ((mystreamer->base.bbs_buffer.maxlen -
mystreamer->bytes_written) <=
+ footer_bound)
----------------------------------------------------------
I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.
Regards,
Jeevan Ladhe
On Thu, 10 Feb 2022 at 18:11, Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Show quoted text
Hi,
On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
jeevan.ladhe@enterprisedb.com> wrote:
Hi Robert,
I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.Please find the attached patch to support client side compression
and decompression using lz4.Added a new lz4 bbstreamer to compress the archive chunks at
client if user has specified --compress=clinet-lz4:[LEVEL] option
in pg_basebackup. The new streamer accepts archive chunks
compresses it and forwards it to plain-writer.Similarly, If a user has specified a server compressed lz4 archive
with plain format (-F p) backup then it requires decompressing
the compressed archive chunks before forwarding it to tar extractor.
Added a new bbstreamer to decompress the compressed archive
and forward it to tar extractor.Note: This patch can be applied on Jeevan Ladhe's v12 patch
for lz4 compression.Thanks,
Dipesh
Hi,
Thanks for the feedback, I have incorporated the suggestions
and updated a new patch. PFA v2 patch.
I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.
In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
based on the number of bytes decompressed in each iteration. I think
we cannot replace it with "len" here.
Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase. I have applied it on commit 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
to validate my v2 patch.
Thanks,
Dipesh
Attachments:
v2-0001-support-client-side-compression-and-decompression-us.patchtext/x-patch; charset=US-ASCII; name=v2-0001-support-client-side-compression-and-decompression-us.patchDownload
From 47a0ef4348747ffa61eccd7954e00f3cf5fc7222 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Thu, 3 Feb 2022 18:31:03 +0530
Subject: [PATCH] support client side compression and decompression using LZ4
---
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 3 +
src/bin/pg_basebackup/bbstreamer_lz4.c | 431 ++++++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 32 +-
src/bin/pg_verifybackup/t/009_extract.pl | 7 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++++++
src/tools/msvc/Mkvcbuild.pm | 1 +
7 files changed, 580 insertions(+), 6 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c
create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index ada3a5a..1d0db4f 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -43,6 +43,7 @@ BBOBJS = \
bbstreamer_file.o \
bbstreamer_gzip.o \
bbstreamer_inject.o \
+ bbstreamer_lz4.o \
bbstreamer_tar.o
all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fe49ae3..c2de77b 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
new file mode 100644
index 0000000..f0bc226
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -0,0 +1,431 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_lz4.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_lz4.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBLZ4
+typedef struct bbstreamer_lz4_frame
+{
+ bbstreamer base;
+
+ LZ4F_compressionContext_t cctx;
+ LZ4F_decompressionContext_t dctx;
+ LZ4F_preferences_t prefs;
+
+ size_t bytes_written;
+ bool header_written;
+} bbstreamer_lz4_frame;
+
+static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
+ .content = bbstreamer_lz4_compressor_content,
+ .finalize = bbstreamer_lz4_compressor_finalize,
+ .free = bbstreamer_lz4_compressor_free
+};
+
+static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
+ .content = bbstreamer_lz4_decompressor_content,
+ .finalize = bbstreamer_lz4_decompressor_finalize,
+ .free = bbstreamer_lz4_decompressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs lz4 compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBLZ4
+ bbstreamer_lz4_frame *streamer;
+ LZ4F_errorCode_t ctxError;
+ LZ4F_preferences_t *prefs;
+ size_t compressed_bound;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_lz4_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_lz4_compressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ streamer->header_written = false;
+
+ /* Initialize stream compression preferences */
+ prefs = &streamer->prefs;
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->compressionLevel = compresslevel;
+
+ /*
+ * Find out the compression bound, it specifies the minimum destination
+ * capacity required in worst case for the success of compression operation
+ * (LZ4F_compressUpdate) based on a given source size and preferences.
+ */
+ compressed_bound = LZ4F_compressBound(streamer->base.bbs_buffer.maxlen, prefs);
+
+ /* Enlarge buffer if it falls short of compression bound. */
+ if (streamer->base.bbs_buffer.maxlen <= compressed_bound)
+ enlargeStringInfo(&streamer->base.bbs_buffer, compressed_bound);
+
+ ctxError = LZ4F_createCompressionContext(&streamer->cctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ pg_log_error("could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+/*
+ * Compress the input data to output buffer.
+ *
+ * Find out the compression bound based on input data length for each
+ * invocation to make sure that output buffer has enough capacity to
+ * accommodate the compressed data. In case if the output buffer
+ * capacity falls short of compression bound then forward the content
+ * of output buffer to next streamer and empty the buffer.
+ */
+static void
+bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_lz4_frame *mystreamer;
+ uint8 *next_in,
+ *next_out;
+ size_t out_bound,
+ compressed_size,
+ avail_out;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ next_in = (uint8 *) data;
+
+ /* Write header before processing the first input chunk. */
+ if (!mystreamer->header_written)
+ {
+ compressed_size = LZ4F_compressBegin(mystreamer->cctx,
+ (uint8 *) mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ &mystreamer->prefs);
+
+ if (LZ4F_isError(compressed_size))
+ pg_log_error("could not write lz4 header: %s",
+ LZ4F_getErrorName(compressed_size));
+
+ mystreamer->bytes_written += compressed_size;
+ mystreamer->header_written = true;
+ }
+
+ /*
+ * Update the offset and capacity of output buffer based on number of bytes
+ * written to output buffer.
+ */
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+
+ /*
+ * Find out the compression bound and make sure that output buffer has the
+ * required capacity for the success of LZ4F_compressUpdate. If needed
+ * forward the content to next streamer and empty the buffer.
+ */
+ out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
+ Assert(mystreamer->base.bbs_buffer.maxlen >= out_bound);
+ if (avail_out <= out_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ context);
+
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->bytes_written = 0;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ }
+
+ /*
+ * This call compresses the data starting at next_in and generates the
+ * output starting at next_out. It expects the caller to provide the size
+ * of input buffer and capacity of output buffer by providing parameters
+ * len and avail_out.
+ *
+ * It returns the number of bytes compressed to output buffer.
+ */
+ compressed_size = LZ4F_compressUpdate(mystreamer->cctx,
+ next_out, avail_out,
+ next_in, len, NULL);
+
+ if (LZ4F_isError(compressed_size))
+ pg_log_error("could not compress data: %s",
+ LZ4F_getErrorName(compressed_size));
+
+ mystreamer->bytes_written += compressed_size;
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+ uint8 *next_out;
+ size_t footer_bound,
+ compressed_size,
+ avail_out;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+
+ /* Find out the footer bound and update the output buffer. */
+ footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
+ Assert(mystreamer->base.bbs_buffer.maxlen >= footer_bound);
+ if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <=
+ footer_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ BBSTREAMER_UNKNOWN);
+
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->bytes_written = 0;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ }
+ else
+ {
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data + mystreamer->bytes_written;
+ avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+ }
+
+ /*
+ * Finalize the frame and flush whatever data remaining in compression
+ * context.
+ */
+ compressed_size = LZ4F_compressEnd(mystreamer->cctx,
+ next_out, avail_out, NULL);
+
+ if (LZ4F_isError(compressed_size))
+ pg_log_error("could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressed_size));
+
+ mystreamer->bytes_written += compressed_size;
+
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_lz4_compressor_free(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ bbstreamer_free(streamer->bbs_next);
+ LZ4F_freeCompressionContext(mystreamer->cctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of lz4
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_lz4_decompressor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBLZ4
+ bbstreamer_lz4_frame *streamer;
+ LZ4F_errorCode_t ctxError;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_lz4_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_lz4_decompressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+
+ /* Initialize internal stream state for decompression */
+ ctxError = LZ4F_createDecompressionContext(&streamer->dctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ {
+ pg_log_error("could not initialize compression library: %s",
+ LZ4F_getErrorName(ctxError));
+ exit(1);
+ }
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+/*
+ * Decompress the input data to output buffer until we run out of input
+ * data. Each time the output buffer is full, pass on the decompressed data
+ * to the next streamer.
+ */
+static void
+bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_lz4_frame *mystreamer;
+ uint8 *next_in,
+ *next_out;
+ size_t avail_in,
+ avail_out;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ next_in = (uint8 *) data;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ avail_in = len;
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+
+ while (avail_in > 0)
+ {
+ size_t ret,
+ read_size,
+ out_size;
+
+ read_size = avail_in;
+ out_size = avail_out;
+
+ /*
+ * This call decompresses the data starting at next_in and generates
+ * the output data starting at next_out. It expects the caller to
+ * provide size of the input buffer and total capacity of the output
+ * buffer by providing the read_size and out_size parameters
+ * respectively.
+ *
+ * Per the documentation of LZ4, parameters read_size and out_size
+ * behaves as dual parameters. On return, the number of bytes consumed
+ * from the input buffer will be written back to read_size and the
+ * number of bytes decompressed to output buffer will be written back
+ * to out_size respectively.
+ */
+ ret = LZ4F_decompress(mystreamer->dctx,
+ next_out, &out_size,
+ next_in, &read_size, NULL);
+
+ if (LZ4F_isError(ret))
+ pg_log_error("could not decompress data: %s",
+ LZ4F_getErrorName(ret));
+
+ /* Update input buffer based on number of bytes consumed */
+ avail_in -= read_size;
+ next_in += read_size;
+
+ mystreamer->bytes_written += out_size;
+
+ /*
+ * If output buffer is full then forward the content to next streamer and
+ * update the output buffer.
+ */
+ if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ context);
+
+ avail_out = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->bytes_written = 0;
+ next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
+ }
+ else
+ {
+ avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
+ next_out += mystreamer->bytes_written;
+ }
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_lz4_decompressor_free(bbstreamer *streamer)
+{
+ bbstreamer_lz4_frame *mystreamer;
+
+ mystreamer = (bbstreamer_lz4_frame *) streamer;
+ bbstreamer_free(streamer->bbs_next);
+ LZ4F_freeDecompressionContext(mystreamer->dctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 923659d..00b2563 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1003,6 +1003,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ {
+ *methodres = COMPRESSION_LZ4;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
{
*methodres = COMPRESSION_LZ4;
@@ -1125,7 +1130,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz;
+ is_tar_gz,
+ is_tar_lz4;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1144,6 +1150,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_gz = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+ /* Is this a LZ4 archive? */
+ is_tar_lz4 = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1153,7 +1163,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz)
+ if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1217,6 +1227,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file,
compresslevel);
}
+ else if (compressmethod == COMPRESSION_LZ4)
+ {
+ strlcat(archive_filename, ".lz4", sizeof(archive_filename));
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
+ streamer = bbstreamer_lz4_compressor_new(streamer,
+ compresslevel);
+ }
else
{
Assert(false); /* not reachable */
@@ -1269,9 +1287,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
- compressloc == COMPRESS_LOCATION_SERVER)
- streamer = bbstreamer_gzip_decompressor_new(streamer);
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ {
+ if (compressmethod == COMPRESSION_GZIP)
+ streamer = bbstreamer_gzip_decompressor_new(streamer);
+ else if (compressmethod == COMPRESSION_LZ4)
+ streamer = bbstreamer_lz4_decompressor_new(streamer);
+ }
/* Return the results. */
*manifest_inject_streamer_p = manifest_inject_streamer;
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 51b77e4..9f9a7cc 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 4;
+use Test::More tests => 6;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -27,6 +27,11 @@ my @test_configuration = (
'compression_method' => 'gzip',
'backup_flags' => ['--compress', 'server-gzip:5'],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'server-lz4:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
}
);
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
new file mode 100644
index 0000000..34c9b90
--- /dev/null
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -0,0 +1,111 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# This test case aims to verify that client-side backup compression work
+# properly, and it also aims to verify that pg_verifybackup can verify a base
+# backup that didn't start out in plain format.
+
+use strict;
+use warnings;
+use Config;
+use File::Path qw(rmtree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 9;
+
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->start;
+
+my $backup_path = $primary->backup_dir . '/client-backup';
+my $extract_path = $primary->backup_dir . '/extracted-backup';
+
+my @test_configuration = (
+ {
+ 'compression_method' => 'none',
+ 'backup_flags' => [],
+ 'backup_archive' => 'base.tar',
+ 'enabled' => 1
+ },
+ {
+ 'compression_method' => 'gzip',
+ 'backup_flags' => ['--compress', 'client-gzip:5'],
+ 'backup_archive' => 'base.tar.gz',
+ 'decompress_program' => $ENV{'GZIP_PROGRAM'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'client-lz4:5'],
+ 'backup_archive' => 'base.tar.lz4',
+ 'decompress_program' => $ENV{'LZ4'},
+ 'decompress_flags' => [ '-d' ],
+ 'output_file' => 'base.tar',
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ }
+);
+
+for my $tc (@test_configuration)
+{
+ my $method = $tc->{'compression_method'};
+
+ SKIP: {
+ skip "$method compression not supported by this build", 3
+ if ! $tc->{'enabled'};
+ skip "no decompressor available for $method", 3
+ if exists $tc->{'decompress_program'} &&
+ !defined $tc->{'decompress_program'};
+
+ # Take a client-side backup.
+ my @backup = (
+ 'pg_basebackup', '-D', $backup_path,
+ '-Xfetch', '--no-sync', '-cfast', '-Ft');
+ push @backup, @{$tc->{'backup_flags'}};
+ $primary->command_ok(\@backup,
+ "client side backup, compression $method");
+
+
+ # Verify that the we got the files we expected.
+ my $backup_files = join(',',
+ sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
+ my $expected_backup_files = join(',',
+ sort ('backup_manifest', $tc->{'backup_archive'}));
+ is($backup_files,$expected_backup_files,
+ "found expected backup files, compression $method");
+
+ # Decompress.
+ if (exists $tc->{'decompress_program'})
+ {
+ my @decompress = ($tc->{'decompress_program'});
+ push @decompress, @{$tc->{'decompress_flags'}}
+ if $tc->{'decompress_flags'};
+ push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
+ push @decompress, $backup_path . '/' . $tc->{'output_file'}
+ if $tc->{'output_file'};
+ system_or_bail(@decompress);
+ }
+
+ SKIP: {
+ my $tar = $ENV{TAR};
+ # don't check for a working tar here, to accomodate various odd
+ # cases such as AIX. If tar doesn't work the init_from_backup below
+ # will fail.
+ skip "no tar program available", 1
+ if (!defined $tar || $tar eq '');
+
+ # Untar.
+ mkdir($extract_path);
+ system_or_bail($tar, 'xf', $backup_path . '/base.tar',
+ '-C', $extract_path);
+
+ # Verify.
+ $primary->command_ok([ 'pg_verifybackup', '-n',
+ '-m', "$backup_path/backup_manifest", '-e', $extract_path ],
+ "verify backup, compression $method");
+ }
+
+ # Cleanup.
+ rmtree($extract_path);
+ rmtree($backup_path);
+ }
+}
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index a310bcb..bab81bd 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -379,6 +379,7 @@ sub mkvcbuild
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_file.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_gzip.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_inject.c');
+ $pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_lz4.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_tar.c');
$pgbasebackup->AddLibrary('ws2_32.lib');
--
1.8.3.1
Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase.
Sure, please find the rebased patch attached.
Regards,
Jeevan
On Fri, 11 Feb 2022 at 14:13, Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Show quoted text
Hi,
Thanks for the feedback, I have incorporated the suggestions
and updated a new patch. PFA v2 patch.I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
based on the number of bytes decompressed in each iteration. I think
we cannot replace it with "len" here.Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase. I have applied it on commit
400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
to validate my v2 patch.Thanks,
Dipesh
Attachments:
v13-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchapplication/octet-stream; name=v13-0001-Add-a-LZ4-compression-method-for-server-side-compres.patchDownload
From 683fc703574ca27cd3e1a1d3d436ee56fcb0f7d4 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Fri, 11 Feb 2022 15:35:46 +0530
Subject: [PATCH] Add a LZ4 compression method for server side compression.
Add LZ4 server side compression option --compress=server-lz4
Provide compression-level for lz4 compression.
Add tap test scenario in pg_verifybackup for lz4.
Add documentation.
Add pg_basebackup help for lz4 option.
Example usage:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-lz4:4
---
doc/src/sgml/protocol.sgml | 7 +-
doc/src/sgml/ref/pg_basebackup.sgml | 24 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_lz4.c | 298 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 18 +-
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 10 +-
src/include/replication/basebackup_sink.h | 1 +
9 files changed, 349 insertions(+), 18 deletions(-)
create mode 100644 src/backend/replication/basebackup_lz4.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index fd03c860bd..1c5ab00879 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2724,8 +2724,8 @@ The commands accepted in replication mode are:
<listitem>
<para>
Instructs the server to compress the backup using the specified
- method. Currently, the only supported method is
- <literal>gzip</literal>.
+ method. Currently, the supported methods are <literal>gzip</literal>
+ and <literal>lz4</literal>.
</para>
</listitem>
</varlistentry>
@@ -2736,7 +2736,8 @@ The commands accepted in replication mode are:
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- The value should be an integer between 1 and 9.
+ For <literal>gzip</literal> the value should be an integer between 1
+ and 9, and for <literal>lz4</literal> it should be between 1 and 12.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index e7ae29ec3d..7a1b432eba 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,10 +417,13 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to either <literal>gzip</literal>
- for compression with <application>gzip</application>, or
- <literal>none</literal> for no compression. A compression level
- can be optionally specified, by appending the level number after a
+ The compression method can be set to <literal>gzip</literal> for
+ compression with <application>gzip</application>, or
+ <literal>lz4</literal> for compression with
+ <application>lz4</application>, or <literal>none</literal> for no
+ compression. However, <literal>lz4</literal> can be currently only
+ used with <literal>server</literal>. A compression level can be
+ optionally specified, by appending the level number after a
colon (<literal>:</literal>). If no level is specified, the default
compression level will be used. If only a level is specified without
mentioning an algorithm, <literal>gzip</literal> compression will
@@ -428,12 +431,13 @@ PostgreSQL documentation
used if the level is 0.
</para>
<para>
- When the tar format is used, the suffix <filename>.gz</filename> will
- automatically be added to all tar filenames. When the plain format is
- used, client-side compression may not be specified, but it is
- still possible to request server-side compression. If this is done,
- the server will compress the backup for transmission, and the
- client will decompress and extract it.
+ When the tar format is used with <literal>gzip</literal> or
+ <literal>lz4</literal>, the suffix <filename>.gz</filename> or
+ <filename>.lz4</filename> will automatically be added to all tar
+ filenames. When the plain format is used, client-side compression may
+ not be specified, but it is still possible to request server-side
+ compression. If this is done, the server will compress the backup for
+ transmission, and the client will decompress and extract it.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 8ec60ded76..74043ff331 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -19,6 +19,7 @@ OBJS = \
basebackup.o \
basebackup_copy.o \
basebackup_gzip.o \
+ basebackup_lz4.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fcd9161f74..0bf28b55d7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -63,7 +63,8 @@ typedef enum
typedef enum
{
BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4
} basebackup_compression_type;
typedef struct
@@ -903,6 +904,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_NONE;
else if (strcmp(optval, "gzip") == 0)
opt->compression = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(optval, "lz4") == 0)
+ opt->compression = BACKUP_COMPRESSION_LZ4;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1021,6 +1024,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
sink = bbsink_gzip_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_LZ4)
+ sink = bbsink_lz4_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
new file mode 100644
index 0000000000..2a169d2e67
--- /dev/null
+++ b/src/backend/replication/basebackup_lz4.c
@@ -0,0 +1,298 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_lz4.c
+ * Basebackup sink implementing lz4 compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBLZ4
+#include <lz4frame.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBLZ4
+
+typedef struct bbsink_lz4
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level. */
+ int compresslevel;
+
+ LZ4F_compressionContext_t ctx;
+ LZ4F_preferences_t prefs;
+
+ /* Number of bytes staged in output buffer. */
+ size_t bytes_written;
+} bbsink_lz4;
+
+static void bbsink_lz4_begin_backup(bbsink *sink);
+static void bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_lz4_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_lz4_end_archive(bbsink *sink);
+static void bbsink_lz4_cleanup(bbsink *sink);
+
+const bbsink_ops bbsink_lz4_ops = {
+ .begin_backup = bbsink_lz4_begin_backup,
+ .begin_archive = bbsink_lz4_begin_archive,
+ .archive_contents = bbsink_lz4_archive_contents,
+ .end_archive = bbsink_lz4_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_lz4_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_lz4_cleanup
+};
+#endif
+
+/* Create a new basebackup sink that performs lz4 compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_lz4_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBLZ4
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression is not supported by this build")));
+#else
+ bbsink_lz4 *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 12);
+
+ if (compresslevel < 0 || compresslevel > 12)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("lz4 compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_lz4));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBLZ4
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_lz4_begin_backup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t output_buffer_bound;
+ LZ4F_preferences_t *prefs = &mysink->prefs;
+
+ /* Initialize compressor object. */
+ memset(prefs, 0, sizeof(LZ4F_preferences_t));
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->compressionLevel = mysink->compresslevel;
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Since LZ4F_compressUpdate() requires the output buffer of size equal or
+ * greater than that of LZ4F_compressBound(), make sure we have the next
+ * sink's bbs_buffer of length that can accommodate the compressed input
+ * buffer.
+ */
+ output_buffer_bound = LZ4F_compressBound(mysink->base.bbs_buffer_length,
+ &mysink->prefs);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_lz4_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ char *lz4_archive_name;
+ LZ4F_errorCode_t ctxError;
+ size_t headerSize;
+
+ ctxError = LZ4F_createCompressionContext(&mysink->ctx, LZ4F_VERSION);
+ if (LZ4F_isError(ctxError))
+ elog(ERROR, "could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
+
+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);
+
+ if (LZ4F_isError(headerSize))
+ elog(ERROR, "could not write lz4 header: %s",
+ LZ4F_getErrorName(headerSize));
+
+ /*
+ * We need to write the compressed data after the header in the output
+ * buffer. So, make sure to update the notion of bytes written to output
+ * buffer.
+ */
+ mysink->bytes_written += headerSize;
+
+ /* Add ".lz4" to the archive name. */
+ lz4_archive_name = psprintf("%s.lz4", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, lz4_archive_name);
+ pfree(lz4_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_lz4_end_archive() is invoked.
+ */
+static void
+bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t avail_in_bound;
+
+ avail_in_bound = LZ4F_compressBound(avail_in, &mysink->prefs);
+
+ /*
+ * If the number of available bytes has fallen below the value computed by
+ * LZ4F_compressBound(), ask the next sink to process the data so that we
+ * can empty the buffer.
+ */
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ avail_in_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ /*
+ * Compress the input buffer and write it into the output buffer.
+ */
+ compressedSize = LZ4F_compressUpdate(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ (uint8 *) mysink->base.bbs_buffer,
+ avail_in,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not compress data: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /*
+ * Update our notion of how many bytes we've written into output buffer.
+ */
+ mysink->bytes_written += compressedSize;
+}
+
+/*
+ * There might be some data inside lz4's internal buffers; we need to get
+ * that flushed out and also finalize the lz4 frame and then get that forwarded
+ * to the successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_lz4_end_archive(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+ size_t compressedSize;
+ size_t lz4_footer_bound;
+
+ lz4_footer_bound = LZ4F_compressBound(0, &mysink->prefs);
+
+ Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
+
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ lz4_footer_bound)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }
+
+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);
+
+ if (LZ4F_isError(compressedSize))
+ elog(ERROR, "could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressedSize));
+
+ /* Update our notion of how many bytes we've written. */
+ mysink->bytes_written += compressedSize;
+
+ /* Send whatever accumulated output bytes we have. */
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+
+ /* Release the resources. */
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_lz4_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling LZ4F_freeCompressionContext() if needed to avoid memory leak.
+ */
+static void
+bbsink_lz4_cleanup(bbsink *sink)
+{
+ bbsink_lz4 *mysink = (bbsink_lz4 *) sink;
+
+ if (mysink->ctx)
+ {
+ LZ4F_freeCompressionContext(mysink->ctx);
+ mysink->ctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index c40925c1f0..923659ddee 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -1003,6 +1003,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
+ {
+ *methodres = COMPRESSION_LZ4;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1930,6 +1935,9 @@ BaseBackup(void)
case COMPRESSION_GZIP:
compressmethodstr = "gzip";
break;
+ case COMPRESSION_LZ4:
+ compressmethodstr = "lz4";
+ break;
default:
Assert(false);
break;
@@ -2772,8 +2780,12 @@ main(int argc, char **argv)
}
break;
case COMPRESSION_LZ4:
- /* option not supported */
- Assert(false);
+ if (compresslevel > 12)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 12",
+ compresslevel, "lz4");
+ exit(1);
+ }
break;
}
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 1ae818f9a1..851233a6e0 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -9,6 +9,7 @@ export TAR
# used by the command "gzip" to pass down options, so stick with a different
# name.
export GZIP_PROGRAM=$(GZIP)
+export LZ4=$(LZ4)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index d32c86e92e..9d5b0e139a
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -11,7 +11,7 @@ use Config;
use File::Path qw(rmtree);
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
-use Test::More tests => 6;
+use Test::More tests => 9;
my $primary = PostgreSQL::Test::Cluster->new('primary');
$primary->init(allows_streaming => 1);
@@ -35,6 +35,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'GZIP_PROGRAM'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ },
+ {
+ 'compression_method' => 'lz4',
+ 'backup_flags' => ['--compress', 'server-lz4'],
+ 'backup_archive' => 'base.tar.lz4',
+ 'decompress_program' => $ENV{'LZ4'},
+ 'decompress_flags' => [ '-d', '-m'],
+ 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
}
);
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index 2cfa816bb8..a3f8d37258 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -284,6 +284,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
Sure, please find the rebased patch attached.
Thanks, I have validated v2 patch on top of rebased patch.
Thanks,
Dipesh
On Fri, Feb 11, 2022 at 5:58 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase.
Sure, please find the rebased patch attached.
It's Friday today, but I'm feeling brave, and it's still morning here,
so ... committed.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Sure, please find the rebased patch attached.
Thanks, I have validated v2 patch on top of rebased patch.
I'm still feeling brave, so I committed this too after fixing a few
things. In the process I noticed that we don't have support for LZ4
compression of streamed WAL (cf. CreateWalTarMethod). It would be good
to fix that. I'm not quite sure whether
/messages/by-id/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
is basically what we need or whether something else is required.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks Robert for the bravity :-)
Regards,
Jeevan Ladhe
On Fri, 11 Feb 2022, 20:31 Robert Haas, <robertmhaas@gmail.com> wrote:
Show quoted text
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit <dipesh.pandit@gmail.com>
wrote:Sure, please find the rebased patch attached.
Thanks, I have validated v2 patch on top of rebased patch.
I'm still feeling brave, so I committed this too after fixing a few
things. In the process I noticed that we don't have support for LZ4
compression of streamed WAL (cf. CreateWalTarMethod). It would be good
to fix that. I'm not quite sure whether/messages/by-id/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
is basically what we need or whether something else is required.--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Feb 11, 2022 at 08:35:25PM +0530, Jeevan Ladhe wrote:
Thanks Robert for the bravity :-)
FYI: there's a couple typos in the last 2 patches.
I added them to my typos branch; feel free to wait until April if you'd prefer
to see them fixed in bulk.
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd19..649b91208f3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -419,7 +419,7 @@ PostgreSQL documentation
<para>
The compression method can be set to <literal>gzip</literal> or
<literal>lz4</literal>, or <literal>none</literal> for no
- compression. A compression level can be optionally specified, by
+ compression. A compression level can optionally be specified, by
appending the level number after a colon (<literal>:</literal>). If no
level is specified, the default compression level will be used. If
only a level is specified without mentioning an algorithm,
@@ -440,7 +440,7 @@ PostgreSQL documentation
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
compression is selected, but will not be compressed if server-side
- compresion or LZ4 compresion is selected.
+ compression or LZ4 compression is selected.
</para>
</listitem>
</varlistentry>
On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
FYI: there's a couple typos in the last 2 patches.
Hmm. OK. But I don't consider "can be optionally specified" incorrect
or worse than "can optionally be specified".
I do agree that spelling words correctly is a good idea.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi, Hackers.
Thank you for developing a great feature.
The current help message shown below does not seem to be able to specify the 'client-' or 'server-' for lz4 compression.
--compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
The attached small patch fixes the help message as follows:
--compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]
Regards,
Noriyoshi Shinoda
-----Original Message-----
From: Robert Haas <robertmhaas@gmail.com>
Sent: Saturday, February 12, 2022 12:50 AM
To: Justin Pryzby <pryzby@telsasoft.com>
Cc: Jeevan Ladhe <jeevanladhe.os@gmail.com>; Dipesh Pandit <dipesh.pandit@gmail.com>; Abhijit Menon-Sen <ams@toroid.org>; Dmitry Dolgov <9erthalion6@gmail.com>; Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>; Mark Dilger <mark.dilger@enterprisedb.com>; pgsql-hackers@postgresql.org; tushar <tushar.ahuja@enterprisedb.com>
Subject: Re: refactoring basebackup.c
On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
FYI: there's a couple typos in the last 2 patches.
Hmm. OK. But I don't consider "can be optionally specified" incorrect or worse than "can optionally be specified".
I do agree that spelling words correctly is a good idea.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
pg_basebackup_help_v1.diffapplication/octet-stream; name=pg_basebackup_help_v1.diffDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0003b59..b96df24 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client,server}-]{gzip,lz4},none}[:LEVEL] or [LEVEL]\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
The LZ4 patches caused new compiler warnings.
It's the same issue that was fixed at 71cbbbbe8 for gzip.
I think they would've been visible in the CI environment, too.
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=wrasse&dt=2022-02-12%2005%3A08%3A48&stg=make
"/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/basebackup_lz4.c", line 87: warning: Function has no return statement : bbsink_lz4_new
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=bowerbird&dt=2022-02-12%2013%3A11%3A20&stg=make
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hamerkop&dt=2022-02-12%2010%3A04%3A08&stg=make
warning C4715: 'bbsink_lz4_new': not all control paths return a value
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=anole&dt=2022-02-12%2005%3A46%3A44&stg=make
"basebackup_lz4.c", line 87: warning #2940-D: missing return statement at end of non-void function "bbsink_lz4_new"
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=wrasse&dt=2022-02-12%2005%3A08%3A48&stg=make
"/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/basebackup_lz4.c", line 87: warning: Function has no return statement : bbsink_lz4_new
Hi,
On 2022-02-12 15:12:21 -0600, Justin Pryzby wrote:
I think they would've been visible in the CI environment, too.
Yea, but only if you looked carefully enough. The postgres github repo has CI
enabled, and it's green. But the windows build step does show the warnings:
https://cirrus-ci.com/task/6185407539838976?logs=build#L2066
https://cirrus-ci.com/github/postgres/postgres/
[19:08:09.086] c:\cirrus\src\backend\replication\basebackup_lz4.c(87): warning C4715: 'bbsink_lz4_new': not all control paths return a value [c:\cirrus\postgres.vcxproj]
Probably worth scripting something to make the windows task error out if there
had been warnings, but only after running the tests.
Greetings,
Andres Freund
On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
<noriyoshi.shinoda@hpe.com> wrote:
Thank you for developing a great feature.
The current help message shown below does not seem to be able to specify the 'client-' or 'server-' for lz4 compression.
--compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]The attached small patch fixes the help message as follows:
--compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]
Hmm. After studying this a bit more closely, I think this might
actually need a bit more revision than what you propose here. In most
places, we use vertical bars to separate alternatives:
-X, --wal-method=none|fetch|stream
But here, we're using commas in some places and the word "or" in one
case as well:
-Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]
We're also not consistently using braces for grouping, which makes the
order of operations a bit unclear, and it makes no sense to put
brackets around LEVEL when it's the only thing that's part of that
alternative.
A more consistent way of writing the supported syntax would be like this:
-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:
-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
Please find the attached updated version of patch for ZSTD server side
compression.
This patch has following changes:
- Fixes the issue Tushar reported[1]/messages/by-id/6c3f1558-1e56-9946-78a2-c59340da1dbf@enterprisedb.com.
- Adds a tap test.
- Makes document changes related to zstd.
- Updates pg_basebackup help for pg_basebackup. Here I have chosen the
suggestion by Robert upthread (as given below):
I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:
-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}
- pg_indent on basebackup_zstd.c.
Thanks Tushar, for offline help for testing the patch.
[1]: /messages/by-id/6c3f1558-1e56-9946-78a2-c59340da1dbf@enterprisedb.com
/messages/by-id/6c3f1558-1e56-9946-78a2-c59340da1dbf@enterprisedb.com
Regards,
Jeevan Ladhe
On Mon, 14 Feb 2022 at 21:30, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
<noriyoshi.shinoda@hpe.com> wrote:Thank you for developing a great feature.
The current help message shown below does not seem to be able to specifythe 'client-' or 'server-' for lz4 compression.
--compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
The attached small patch fixes the help message as follows:
--compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]Hmm. After studying this a bit more closely, I think this might
actually need a bit more revision than what you propose here. In most
places, we use vertical bars to separate alternatives:-X, --wal-method=none|fetch|stream
But here, we're using commas in some places and the word "or" in one
case as well:-Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]
We're also not consistently using braces for grouping, which makes the
order of operations a bit unclear, and it makes no sense to put
brackets around LEVEL when it's the only thing that's part of that
alternative.A more consistent way of writing the supported syntax would be like this:
-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v10-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patchapplication/octet-stream; name=v10-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patchDownload
From a494ec33c2b72176afd3f7decfe571c969133012 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Tue, 15 Feb 2022 18:45:52 +0530
Subject: [PATCH] Add a ZSTD compression method for server side compression.
This patch introduces --compress=server-zstd[:LEVEL]
Add tap test.
Add config option --with-zstd.
Add documentation for ZSTD option.
Add pg_basebackup help for ZSTD option.
Example:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-zstd:4
---
configure | 295 +++++++++++++++++++++-
configure.ac | 33 +++
doc/src/sgml/protocol.sgml | 5 +-
doc/src/sgml/ref/pg_basebackup.sgml | 38 +--
src/Makefile.global.in | 1 +
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_zstd.c | 294 +++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 18 +-
src/bin/pg_basebackup/pg_receivewal.c | 4 +
src/bin/pg_basebackup/walmethods.h | 1 +
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 9 +
src/include/pg_config.h.in | 6 +
src/include/replication/basebackup_sink.h | 1 +
15 files changed, 686 insertions(+), 28 deletions(-)
create mode 100644 src/backend/replication/basebackup_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/configure b/configure
index 9305555658..fc83c17c68 100755
--- a/configure
+++ b/configure
@@ -650,6 +650,7 @@ CFLAGS_ARMV8_CRC32C
CFLAGS_SSE42
have_win32_dbghelp
LIBOBJS
+ZSTD
LZ4
UUID_LIBS
LDAP_LIBS_BE
@@ -700,6 +701,9 @@ with_gnu_ld
LD
LDFLAGS_SL
LDFLAGS_EX
+ZSTD_LIBS
+ZSTD_CFLAGS
+with_zstd
LZ4_LIBS
LZ4_CFLAGS
with_lz4
@@ -801,6 +805,7 @@ infodir
docdir
oldincludedir
includedir
+runstatedir
localstatedir
sharedstatedir
sysconfdir
@@ -869,6 +874,7 @@ with_libxslt
with_system_tzdata
with_zlib
with_lz4
+with_zstd
with_gnu_ld
with_ssl
with_openssl
@@ -898,6 +904,8 @@ XML2_CFLAGS
XML2_LIBS
LZ4_CFLAGS
LZ4_LIBS
+ZSTD_CFLAGS
+ZSTD_LIBS
LDFLAGS_EX
LDFLAGS_SL
PERL
@@ -942,6 +950,7 @@ datadir='${datarootdir}'
sysconfdir='${prefix}/etc'
sharedstatedir='${prefix}/com'
localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
includedir='${prefix}/include'
oldincludedir='/usr/include'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1194,6 +1203,15 @@ do
| -silent | --silent | --silen | --sile | --sil)
silent=yes ;;
+ -runstatedir | --runstatedir | --runstatedi | --runstated \
+ | --runstate | --runstat | --runsta | --runst | --runs \
+ | --run | --ru | --r)
+ ac_prev=runstatedir ;;
+ -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+ | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+ | --run=* | --ru=* | --r=*)
+ runstatedir=$ac_optarg ;;
+
-sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
ac_prev=sbindir ;;
-sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1331,7 +1349,7 @@ fi
for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
- libdir localedir mandir
+ libdir localedir mandir runstatedir
do
eval ac_val=\$$ac_var
# Remove trailing slashes.
@@ -1484,6 +1502,7 @@ Fine tuning of the installation directories:
--sysconfdir=DIR read-only single-machine data [PREFIX/etc]
--sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
--localstatedir=DIR modifiable single-machine data [PREFIX/var]
+ --runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run]
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -1577,6 +1596,7 @@ Optional Packages:
use system time zone data in DIR
--without-zlib do not use Zlib
--with-lz4 build with LZ4 support
+ --with-zstd build with ZSTD support
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
--with-ssl=LIB use LIB for SSL/TLS support (openssl)
--with-openssl obsolete spelling of --with-ssl=openssl
@@ -1606,6 +1626,8 @@ Some influential environment variables:
XML2_LIBS linker flags for XML2, overriding pkg-config
LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config
LZ4_LIBS linker flags for LZ4, overriding pkg-config
+ ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config
+ ZSTD_LIBS linker flags for ZSTD, overriding pkg-config
LDFLAGS_EX extra linker flags for linking executables only
LDFLAGS_SL extra linker flags for linking shared libraries only
PERL Perl program
@@ -9034,6 +9056,146 @@ fi
done
fi
+#
+# ZSTD
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with ZSTD support" >&5
+$as_echo_n "checking whether to build with ZSTD support... " >&6; }
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+ withval=$with_zstd;
+ case $withval in
+ yes)
+
+$as_echo "#define USE_ZSTD 1" >>confdefs.h
+
+ ;;
+ no)
+ :
+ ;;
+ *)
+ as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+ ;;
+ esac
+
+else
+ with_zstd=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_zstd" >&5
+$as_echo "$with_zstd" >&6; }
+
+
+if test "$with_zstd" = yes; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
+$as_echo_n "checking for libzstd... " >&6; }
+
+if test -n "$ZSTD_CFLAGS"; then
+ pkg_cv_ZSTD_CFLAGS="$ZSTD_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+if test -n "$ZSTD_LIBS"; then
+ pkg_cv_ZSTD_LIBS="$ZSTD_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+ _pkg_short_errors_supported=yes
+else
+ _pkg_short_errors_supported=no
+fi
+ if test $_pkg_short_errors_supported = yes; then
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libzstd" 2>&1`
+ else
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libzstd" 2>&1`
+ fi
+ # Put the nasty error message in config.log where it belongs
+ echo "$ZSTD_PKG_ERRORS" >&5
+
+ as_fn_error $? "Package requirements (libzstd) were not met:
+
+$ZSTD_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+ ZSTD_CFLAGS=$pkg_cv_ZSTD_CFLAGS
+ ZSTD_LIBS=$pkg_cv_ZSTD_LIBS
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -13130,6 +13292,56 @@ fi
fi
+if test "$with_zstd" = yes ; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+/* Override any GCC internal prototype to avoid an error.
+ Use char because int might match the return type of a GCC
+ builtin and then its argument prototype would still apply. */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+ ;
+ return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+ ac_cv_lib_zstd_ZSTD_compress=yes
+else
+ ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+ LIBS="-lzstd $LIBS"
+
+else
+ as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -13904,6 +14116,77 @@ done
fi
+if test -z "$ZSTD"; then
+ for ac_prog in zstd
+do
+ # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+$as_echo_n "checking for $ac_word... " >&6; }
+if ${ac_cv_path_ZSTD+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ case $ZSTD in
+ [\\/]* | ?:[\\/]*)
+ ac_cv_path_ZSTD="$ZSTD" # Let the user override the test with a path.
+ ;;
+ *)
+ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+ IFS=$as_save_IFS
+ test -z "$as_dir" && as_dir=.
+ for ac_exec_ext in '' $ac_executable_extensions; do
+ if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
+ ac_cv_path_ZSTD="$as_dir/$ac_word$ac_exec_ext"
+ $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
+ break 2
+ fi
+done
+ done
+IFS=$as_save_IFS
+
+ ;;
+esac
+fi
+ZSTD=$ac_cv_path_ZSTD
+if test -n "$ZSTD"; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+else
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+
+ test -n "$ZSTD" && break
+done
+
+else
+ # Report the value of ZSTD in configure's output in all cases.
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD" >&5
+$as_echo_n "checking for ZSTD... " >&6; }
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+fi
+
+if test "$with_zstd" = yes; then
+ for ac_header in zstd.h
+do :
+ ac_fn_c_check_header_mongrel "$LINENO" "zstd.h" "ac_cv_header_zstd_h" "$ac_includes_default"
+if test "x$ac_cv_header_zstd_h" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_ZSTD_H 1
+_ACEOF
+
+else
+ as_fn_error $? "zstd.h header file is required for ZSTD" "$LINENO" 5
+fi
+
+done
+
+fi
+
if test "$with_gssapi" = yes ; then
for ac_header in gssapi/gssapi.h
do :
@@ -15307,7 +15590,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15353,7 +15636,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15377,7 +15660,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15422,7 +15705,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15446,7 +15729,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
diff --git a/configure.ac b/configure.ac
index 16167329fc..729b23fbea 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1056,6 +1056,30 @@ if test "$with_lz4" = yes; then
done
fi
+#
+# ZSTD
+#
+AC_MSG_CHECKING([whether to build with ZSTD support])
+PGAC_ARG_BOOL(with, zstd, no, [build with ZSTD support],
+ [AC_DEFINE([USE_ZSTD], 1, [Define to 1 to build with ZSTD support. (--with-zstd)])])
+AC_MSG_RESULT([$with_zstd])
+AC_SUBST(with_zstd)
+
+if test "$with_zstd" = yes; then
+ PKG_CHECK_MODULES(ZSTD, libzstd)
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -1325,6 +1349,10 @@ if test "$with_lz4" = yes ; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
+if test "$with_zstd" = yes ; then
+ AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])])
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -1490,6 +1518,11 @@ if test "$with_lz4" = yes; then
AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])
fi
+PGAC_PATH_PROGS(ZSTD, zstd)
+if test "$with_zstd" = yes; then
+ AC_CHECK_HEADERS(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])])
+fi
+
if test "$with_gssapi" = yes ; then
AC_CHECK_HEADERS(gssapi/gssapi.h, [],
[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1c5ab00879..c13d25051c 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2725,7 +2725,7 @@ The commands accepted in replication mode are:
<para>
Instructs the server to compress the backup using the specified
method. Currently, the supported methods are <literal>gzip</literal>
- and <literal>lz4</literal>.
+ <literal>lz4</literal>, and <literal>zstd</literal>.
</para>
</listitem>
</varlistentry>
@@ -2737,7 +2737,8 @@ The commands accepted in replication mode are:
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
For <literal>gzip</literal> the value should be an integer between 1
- and 9, and for <literal>lz4</literal> it should be between 1 and 12.
+ and 9, for <literal>lz4</literal> between 1 and 12, and for
+ <literal>zstd</literal> it should be between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd1..4cf28a2a61 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,30 +417,32 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to <literal>gzip</literal> or
- <literal>lz4</literal>, or <literal>none</literal> for no
- compression. A compression level can be optionally specified, by
- appending the level number after a colon (<literal>:</literal>). If no
- level is specified, the default compression level will be used. If
- only a level is specified without mentioning an algorithm,
- <literal>gzip</literal> compression will be used if the level is
- greater than 0, and no compression will be used if the level is 0.
- </para>
- <para>
- When the tar format is used with <literal>gzip</literal> or
- <literal>lz4</literal>, the suffix <filename>.gz</filename> or
- <filename>.lz4</filename> will automatically be added to all tar
- filenames. When the plain format is used, client-side compression may
- not be specified, but it is still possible to request server-side
- compression. If this is done, the server will compress the backup for
- transmission, and the client will decompress and extract it.
+ The compression method can be set to <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> for no compression. A compression level can
+ optionally be specified, by appending the level number after a colon
+ (<literal>:</literal>). If no level is specified, the default
+ compression level will be used. If only a level is specified without
+ mentioning an algorithm, <literal>gzip</literal> compression will be
+ used if the level is greater than 0, and no compression will be used if
+ the level is 0.
+ </para>
+ <para>
+ When the tar format is used with <literal>gzip</literal>,
+ <literal>lz4</literal>, or <literal>zstd</literal>, the suffix
+ <filename>.gz</filename>, <filename>.lz4</filename>, or
+ <filename>.zst</filename> respectively will be automatically added to
+ all tar filenames. When the plain format is used, client-side
+ compression may not be specified, but it is still possible to request
+ server-side compression. If this is done, the server will compress the
+ backup for transmission, and the client will decompress and extract it.
</para>
<para>
When this option is used in combination with
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
compression is selected, but will not be compressed if server-side
- compresion or LZ4 compresion is selected.
+ compression, LZ4, or ZSTD compression is selected.
</para>
</listitem>
</varlistentry>
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 9dcd54fcbd..c980444233 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -351,6 +351,7 @@ XGETTEXT = @XGETTEXT@
GZIP = gzip
BZIP2 = bzip2
LZ4 = @LZ4@
+ZSTD = @ZSTD@
DOWNLOAD = wget -O $@ --no-use-server-timestamps
#DOWNLOAD = curl -o $@
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..2e6de7007f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -20,6 +20,7 @@ OBJS = \
basebackup_copy.o \
basebackup_gzip.o \
basebackup_lz4.o \
+ basebackup_zstd.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0bf28b55d7..2378ce5c5e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
{
BACKUP_COMPRESSION_NONE,
BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
} basebackup_compression_type;
typedef struct
@@ -906,6 +907,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
else if (strcmp(optval, "lz4") == 0)
opt->compression = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(optval, "zstd") == 0)
+ opt->compression = BACKUP_COMPRESSION_ZSTD;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1026,6 +1029,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
sink = bbsink_gzip_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
+ sink = bbsink_zstd_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
new file mode 100644
index 0000000000..d99b3698f6
--- /dev/null
+++ b/src/backend/replication/basebackup_zstd.c
@@ -0,0 +1,294 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_zstd.c
+ * Basebackup sink implementing zstd compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbsink_zstd
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level */
+ int compresslevel;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbsink_zstd;
+
+static void bbsink_zstd_begin_backup(bbsink *sink);
+static void bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_zstd_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_zstd_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_zstd_end_archive(bbsink *sink);
+static void bbsink_zstd_cleanup(bbsink *sink);
+static void bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+const bbsink_ops bbsink_zstd_ops = {
+ .begin_backup = bbsink_zstd_begin_backup,
+ .begin_archive = bbsink_zstd_begin_archive,
+ .archive_contents = bbsink_zstd_archive_contents,
+ .end_archive = bbsink_zstd_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_zstd_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_zstd_end_backup,
+ .cleanup = bbsink_zstd_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs zstd compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+#else
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 22);
+
+ if (compresslevel < 0 || compresslevel > 22)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_zstd));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_zstd_begin_backup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t output_buffer_bound;
+
+ mysink->cctx = ZSTD_createCCtx();
+ if (!mysink->cctx)
+ elog(ERROR, "could not create zstd compression context");
+
+ ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Make sure that the next sink's bbs_buffer is big enough to accommodate
+ * the compressed input buffer.
+ */
+ output_buffer_bound = ZSTD_compressBound(mysink->base.bbs_buffer_length);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ char *zstd_archive_name;
+
+ /*
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they would stick
+ * around as we are resetting with option ZSTD_reset_session_only.
+ */
+ ZSTD_CCtx_reset(mysink->cctx, ZSTD_reset_session_only);
+
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ /* Add ".zst" to the archive name. */
+ zstd_archive_name = psprintf("%s.zst", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, zstd_archive_name);
+ pfree(zstd_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_zstd_end_archive() is invoked.
+ */
+static void
+bbsink_zstd_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ ZSTD_inBuffer inBuf = {mysink->base.bbs_buffer, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx, &mysink->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * There might be some data inside zstd's internal buffers; we need to get that
+ * flushed out, also end the zstd frame and then get that forwarded to the
+ * successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_zstd_end_archive(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx,
+ &mysink->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next sink. */
+ if (mysink->zstd_outBuf.pos > 0)
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Free the resources and context.
+ */
+static void
+bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+
+ bbsink_forward_end_backup(sink, endptr, endtli);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ */
+static void
+bbsink_zstd_cleanup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context if not already released. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0003b59615..3adb3a3845 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client|server}-]{gzip|lz4|zstd}}[:LEVEL]|none}\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -1023,6 +1023,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1970,6 +1975,9 @@ BaseBackup(void)
case COMPRESSION_LZ4:
compressmethodstr = "lz4";
break;
+ case COMPRESSION_ZSTD:
+ compressmethodstr = "zstd";
+ break;
default:
Assert(false);
break;
@@ -2819,6 +2827,14 @@ main(int argc, char **argv)
exit(1);
}
break;
+ case COMPRESSION_ZSTD:
+ if (compresslevel > 22)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 22",
+ compresslevel, "zstd");
+ exit(1);
+ }
+ break;
}
/*
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ccb215c398..9b7656c692 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -904,6 +904,10 @@ main(int argc, char **argv)
exit(1);
#endif
break;
+ case COMPRESSION_ZSTD:
+ pg_log_error("compression with %s is not yet supported", "ZSTD");
+ exit(1);
+
}
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 2dfb353baa..ec54019cfc 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -24,6 +24,7 @@ typedef enum
{
COMPRESSION_GZIP,
COMPRESSION_LZ4,
+ COMPRESSION_ZSTD,
COMPRESSION_NONE
} WalCompressionMethod;
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 851233a6e0..596df15118 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -10,6 +10,7 @@ export TAR
# name.
export GZIP_PROGRAM=$(GZIP)
export LZ4=$(LZ4)
+export ZSTD=$(ZSTD)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index 6927ca4c74..1ccc6cb9df
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -43,6 +43,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d', '-m'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
@@ -108,6 +116,7 @@ for my $tc (@test_configuration)
# Cleanup.
unlink($backup_path . '/backup_manifest');
unlink($backup_path . '/base.tar');
+ unlink($backup_path . '/' . $tc->{'backup_archive'});
rmtree($extract_path);
}
}
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 28a1f0e9f0..26e373e9f7 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -325,6 +325,9 @@
/* Define to 1 if you have the `lz4' library (-llz4). */
#undef HAVE_LIBLZ4
+/* Define to 1 if you have the `zstd' library (-lzstd). */
+#undef HAVE_LIBZSTD
+
/* Define to 1 if you have the `m' library (-lm). */
#undef HAVE_LIBM
@@ -367,6 +370,9 @@
/* Define to 1 if you have the <lz4.h> header file. */
#undef HAVE_LZ4_H
+/* Define to 1 if you have the <zstd.h> header file. */
+#undef HAVE_ZSTD_H
+
/* Define to 1 if you have the <mbarrier.h> header file. */
#undef HAVE_MBARRIER_H
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a3f8d37258..a7f16758a4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On Wed, Feb 9, 2022 at 8:41 AM Abhijit Menon-Sen <ams@toroid.org> wrote:
It took me a while to assimilate these patches, including the backup
targets one, which I hadn't looked at before. Now that I've wrapped my
head around how to put the pieces together, I really like the idea. As
you say, writing non-trivial integrations in C will take some effort,
but it seems worthwhile. It's also nice that one can continue to use
pg_basebackup to trigger the backups and see progress information.
Cool. Thanks for having a look.
Yes, it looks simple to follow the example set by basebackup_to_shell to
write a custom target. The complexity will be in whatever we need to do
to store/forward the backup data, rather than in obtaining the data in
the first place, which is exactly as it should be.
Yeah, that's what made me really happy with how this came out.
Here's v2, rebased and with documentation added.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v2-0001-Allow-extensions-to-add-new-backup-targets.patchapplication/octet-stream; name=v2-0001-Allow-extensions-to-add-new-backup-targets.patchDownload
From 646b572fd08e144fcc792307a596821e617931f6 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 15 Feb 2022 11:24:12 -0500
Subject: [PATCH v2 1/2] Allow extensions to add new backup targets.
Commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc allowed for base backup
targets, meaning that we could do something with the backup other than
send it to the client, but all of those targets had to be baked in to
the core code. This commit makes it possible for extensions to define
additional backup targets.
---
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 82 +++----
src/backend/replication/basebackup_target.c | 238 ++++++++++++++++++++
src/include/replication/basebackup_target.h | 66 ++++++
4 files changed, 332 insertions(+), 55 deletions(-)
create mode 100644 src/backend/replication/basebackup_target.c
create mode 100644 src/include/replication/basebackup_target.h
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..a363e7f94d 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -23,6 +23,7 @@ OBJS = \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
+ basebackup_target.o \
basebackup_throttle.o \
repl_gram.o \
slot.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0bf28b55d7..d32f1bc15d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -28,6 +28,7 @@
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
#include "replication/basebackup_sink.h"
+#include "replication/basebackup_target.h"
#include "replication/backup_manifest.h"
#include "replication/walsender.h"
#include "replication/walsender_private.h"
@@ -53,13 +54,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_TARGET_BLACKHOLE,
- BACKUP_TARGET_CLIENT,
- BACKUP_TARGET_SERVER
-} backup_target_type;
-
typedef enum
{
BACKUP_COMPRESSION_NONE,
@@ -76,8 +70,9 @@ typedef struct
bool includewal;
uint32 maxrate;
bool sendtblspcmapfile;
- backup_target_type target;
- char *target_detail;
+ bool send_to_client;
+ bool use_copytblspc;
+ BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
basebackup_compression_type compression;
int compression_level;
@@ -714,12 +709,12 @@ parse_basebackup_options(List *options, basebackup_options *opt)
bool o_manifest_checksums = false;
bool o_target = false;
bool o_target_detail = false;
- char *target_str = "compat"; /* placate compiler */
+ char *target_str = NULL;
+ char *target_detail_str = NULL;
bool o_compression = false;
bool o_compression_level = false;
MemSet(opt, 0, sizeof(*opt));
- opt->target = BACKUP_TARGET_CLIENT;
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
@@ -863,22 +858,11 @@ parse_basebackup_options(List *options, basebackup_options *opt)
}
else if (strcmp(defel->defname, "target") == 0)
{
- target_str = defGetString(defel);
-
if (o_target)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(target_str, "blackhole") == 0)
- opt->target = BACKUP_TARGET_BLACKHOLE;
- else if (strcmp(target_str, "client") == 0)
- opt->target = BACKUP_TARGET_CLIENT;
- else if (strcmp(target_str, "server") == 0)
- opt->target = BACKUP_TARGET_SERVER;
- else
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized target: \"%s\"", target_str)));
+ target_str = defGetString(defel);
o_target = true;
}
else if (strcmp(defel->defname, "target_detail") == 0)
@@ -889,7 +873,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->target_detail = optval;
+ target_detail_str = optval;
o_target_detail = true;
}
else if (strcmp(defel->defname, "compression") == 0)
@@ -939,22 +923,28 @@ parse_basebackup_options(List *options, basebackup_options *opt)
errmsg("manifest checksums require a backup manifest")));
opt->manifest_checksum_type = CHECKSUM_TYPE_NONE;
}
- if (opt->target == BACKUP_TARGET_SERVER)
+
+ if (target_str == NULL)
{
- if (opt->target_detail == NULL)
+ if (target_detail_str != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("target '%s' requires a target detail",
- target_str)));
+ errmsg("target detail cannot be used without target")));
+ opt->use_copytblspc = true;
+ opt->send_to_client = true;
}
- else
+ else if (strcmp(target_str, "client") == 0)
{
- if (opt->target_detail != NULL)
+ if (target_detail_str != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("target '%s' does not accept a target detail",
target_str)));
+ opt->send_to_client = true;
}
+ else
+ opt->target_handle =
+ BaseBackupGetTargetHandle(target_str, target_detail_str);
if (o_compression_level && !o_compression)
ereport(ERROR,
@@ -990,32 +980,14 @@ SendBaseBackup(BaseBackupCmd *cmd)
}
/*
- * If the TARGET option was specified, then we can use the new copy-stream
- * protocol. If the target is specifically 'client' then set up to stream
- * the backup to the client; otherwise, it's being sent someplace else and
- * should not be sent to the client.
- */
- if (opt.target == BACKUP_TARGET_CLIENT)
- sink = bbsink_copystream_new(true);
- else
- sink = bbsink_copystream_new(false);
-
- /*
- * If a non-default backup target is in use, arrange to send the data
- * wherever it needs to go.
+ * If the target is specifically 'client' then set up to stream the backup
+ * to the client; otherwise, it's being sent someplace else and should not
+ * be sent to the client. BaseBackupGetSink has the job of setting up a
+ * sink to send the backup data wherever it needs to go.
*/
- switch (opt.target)
- {
- case BACKUP_TARGET_BLACKHOLE:
- /* Nothing to do, just discard data. */
- break;
- case BACKUP_TARGET_CLIENT:
- /* Nothing to do, handling above is sufficient. */
- break;
- case BACKUP_TARGET_SERVER:
- sink = bbsink_server_new(sink, opt.target_detail);
- break;
- }
+ sink = bbsink_copystream_new(opt.send_to_client);
+ if (opt.target_handle != NULL)
+ sink = BaseBackupGetSink(opt.target_handle, sink);
/* Set up network throttling, if client requested it */
if (opt.maxrate > 0)
diff --git a/src/backend/replication/basebackup_target.c b/src/backend/replication/basebackup_target.c
new file mode 100644
index 0000000000..d93f5e02db
--- /dev/null
+++ b/src/backend/replication/basebackup_target.c
@@ -0,0 +1,238 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_target.c
+ * Base backups can be "targetted," which means that they can be sent
+ * somewhere other than to the client which requested the backup.
+ * Furthermore, new targets can be defined by extensions. This file
+ * contains code to support that functionality.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_gzip.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "replication/basebackup_target.h"
+#include "utils/memutils.h"
+
+typedef struct BaseBackupTargetType
+{
+ char *name;
+ void *(*check_detail) (char *, char *);
+ bbsink *(*get_sink) (bbsink *, void *);
+} BaseBackupTargetType;
+
+struct BaseBackupTargetHandle
+{
+ BaseBackupTargetType *type;
+ void *detail_arg;
+};
+
+static void initialize_target_list(void);
+extern bbsink *blackhole_get_sink(bbsink *next_sink, void *detail_arg);
+extern bbsink *server_get_sink(bbsink *next_sink, void *detail_arg);
+static void *reject_target_detail(char *target, char *target_detail);
+static void *server_check_detail(char *target, char *target_detail);
+
+static BaseBackupTargetType builtin_backup_targets[] =
+{
+ {
+ "blackhole", reject_target_detail, blackhole_get_sink
+ },
+ {
+ "server", server_check_detail, server_get_sink
+ },
+ {
+ NULL
+ }
+};
+
+static List *BaseBackupTargetTypeList = NIL;
+
+/*
+ * Add a new base backup target type.
+ *
+ * This is intended for use by server extensions.
+ */
+void
+BaseBackupAddTarget(char *name,
+ void *(*check_detail) (char *, char *),
+ bbsink *(*get_sink) (bbsink *, void *))
+{
+ BaseBackupTargetType *ttype;
+ MemoryContext oldcontext;
+ ListCell *lc;
+
+ /* If the target list is not yet initialized, do that first. */
+ if (BaseBackupTargetTypeList == NIL)
+ initialize_target_list();
+
+ /* Search the target type list for an existing entry with this name. */
+ foreach(lc, BaseBackupTargetTypeList)
+ {
+ BaseBackupTargetType *ttype = lfirst(lc);
+
+ if (strcmp(ttype->name, name) == 0)
+ {
+ /*
+ * We found one, so update it.
+ *
+ * It is probably not a great idea to call BaseBackupAddTarget
+ * for the same name multiple times, but if it happens, this
+ * seems like the sanest behavior.
+ */
+ ttype->check_detail = check_detail;
+ ttype->get_sink = get_sink;
+ return;
+ }
+ }
+
+ /*
+ * We use TopMemoryContext for allocations here to make sure that the
+ * data we need doesn't vanish under us; that's also why we copy the
+ * target name into a newly-allocated chunk of memory.
+ */
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ ttype = palloc(sizeof(BaseBackupTargetType));
+ ttype->name = pstrdup(name);
+ ttype->check_detail = check_detail;
+ ttype->get_sink = get_sink;
+ BaseBackupTargetTypeList = lappend(BaseBackupTargetTypeList, ttype);
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Look up a base backup target and validate the target_detail.
+ *
+ * Extensions that define new backup targets will probably define a new
+ * type of bbsink to match. Validation of the target_detail can be performed
+ * either in the check_detail routine called here, or in the bbsink
+ * constructor, which will be called from BaseBackupGetSink. It's mostly
+ * a matter of taste, but the check_detail function runs somewhat earlier.
+ */
+BaseBackupTargetHandle *
+BaseBackupGetTargetHandle(char *target, char *target_detail)
+{
+ ListCell *lc;
+
+ /* If the target list is not yet initialized, do that first. */
+ if (BaseBackupTargetTypeList == NIL)
+ initialize_target_list();
+
+ /* Search the target type list for a match. */
+ foreach(lc, BaseBackupTargetTypeList)
+ {
+ BaseBackupTargetType *ttype = lfirst(lc);
+
+ if (strcmp(ttype->name, target) == 0)
+ {
+ BaseBackupTargetHandle *handle;
+
+ /* Found the target. */
+ handle = palloc(sizeof(BaseBackupTargetHandle));
+ handle->type = ttype;
+ handle->detail_arg = ttype->check_detail(target, target_detail);
+
+ return handle;
+ }
+ }
+
+ /* Did not find the target. */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unrecognized target: \"%s\"", target)));
+}
+
+/*
+ * Construct a bbsink that will implement the backup target.
+ *
+ * The get_sink function does all the real work, so all we have to do here
+ * is call it with the correct arguments. Whatever the check_detail function
+ * returned is here passed through to the get_sink function. This lets those
+ * two functions communicate with each other, if they wish. If not, the
+ * check_detail function can simply return the target_detail and let the
+ * get_sink function take it from there.
+ */
+bbsink *
+BaseBackupGetSink(BaseBackupTargetHandle *handle, bbsink *next_sink)
+{
+ return handle->type->get_sink(next_sink, handle->detail_arg);
+}
+
+/*
+ * Load predefined target types into BaseBackupTargetTypeList.
+ */
+static void
+initialize_target_list(void)
+{
+ BaseBackupTargetType *ttype = builtin_backup_targets;
+ MemoryContext oldcontext;
+
+ oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ while (ttype->name != NULL)
+ {
+ BaseBackupTargetTypeList = lappend(BaseBackupTargetTypeList, ttype);
+ ++ttype;
+ }
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Normally, a get_sink function should construct and return a new bbsink that
+ * implements the backup target, but the 'blackhole' target just throws the
+ * data away. We could implement that by adding a bbsink that does nothing
+ * but forward, but it's even cheaper to implement that by not adding a bbsink
+ * at all.
+ */
+bbsink *
+blackhole_get_sink(bbsink *next_sink, void *detail_arg)
+{
+ return next_sink;
+}
+
+/*
+ * Create a bbsink implementing a server-side backup.
+ */
+bbsink *
+server_get_sink(bbsink *next_sink, void *detail_arg)
+{
+ return bbsink_server_new(next_sink, detail_arg);
+}
+
+/*
+ * Implement target-detail checking for a target that does not accept a
+ * detail.
+ */
+void *
+reject_target_detail(char *target, char *target_detail)
+{
+ if (target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' does not accept a target detail",
+ target)));
+
+ return NULL;
+}
+
+/*
+ * Implement target-detail checking for a server-side backup.
+ *
+ * target_detail should be the name of the directory to which the backup
+ * should be written, but we don't check that here. Rather, that check,
+ * as well as the necessary permissions checking, happens in bbsink_server_new.
+ */
+void *
+server_check_detail(char *target, char *target_detail)
+{
+ if (target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("target '%s' requires a target detail",
+ target)));
+
+ return target_detail;
+}
diff --git a/src/include/replication/basebackup_target.h b/src/include/replication/basebackup_target.h
new file mode 100644
index 0000000000..e23ac29a89
--- /dev/null
+++ b/src/include/replication/basebackup_target.h
@@ -0,0 +1,66 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_target.h
+ * Extensibility framework for adding base backup targets.
+ *
+ * Portions Copyright (c) 2010-2022, PostgreSQL Global Development Group
+ *
+ * src/include/replication/basebackup_target.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BASEBACKUP_TARGET_H
+#define BASEBACKUP_TARGET_H
+
+#include "replication/basebackup_sink.h"
+
+struct BaseBackupTargetHandle;
+typedef struct BaseBackupTargetHandle BaseBackupTargetHandle;
+
+/*
+ * Extensions can call this function to create new backup targets.
+ *
+ * 'name' is the name of the new target.
+ *
+ * 'check_detail' is a function that accepts a target name and target detail
+ * and either throws an error (if the target detail is not valid or some other
+ * problem, such as a permissions issue, is detected) or returns a pointer to
+ * the data that will be needed to create a bbsink implementing that target.
+ * The second argumnt will be NULL if the TARGET_DETAIL option to the
+ * BASE_BACKUP command was not specified.
+ *
+ * 'get_sink' is a function that creates the bbsink. The first argument
+ * is the successor sink; the sink created by this function should always
+ * forward to this sink. The second argument is the pointer returned by a
+ * previous call to the 'check_detail' function.
+ *
+ * In practice, a user will type something like "pg_basebackup --target foo:bar
+ * -Xfetch". That will cause the server to look for a backup target named
+ * "foo". If one is found, the check_detail callback will be invoked for the
+ * string "bar", and whatever that callback returns will be passed as the
+ * second argument to the get_sink callback.
+ */
+extern void BaseBackupAddTarget(char *name,
+ void *(*check_detail) (char *, char *),
+ bbsink * (*get_sink) (bbsink *, void *));
+
+/*
+ * These functions are used by the core code to access base backup targets
+ * added via BaseBackupAddTarget(). The core code will pass the TARGET and
+ * TARGET_DETAIL strings obtained from the user to BaseBackupGetTargetHandle,
+ * which will either throw an error (if the TARGET is not recognized or the
+ * check_detail hook for that TARGET doesn't like the TARGET_DETAIL) or
+ * return a BaseBackupTargetHandle object that can later be passed to
+ * BaseBackupGetSink.
+ *
+ * BaseBackupGetSink constructs a bbsink implementing the desired target
+ * using the BaseBackupTargetHandle and the successor bbsink. It does this
+ * by arranging to call the get_sink() callback provided by the extension
+ * that implements the base backup target.
+ */
+extern BaseBackupTargetHandle *BaseBackupGetTargetHandle(char *target,
+ char *target_detail);
+extern bbsink *BaseBackupGetSink(BaseBackupTargetHandle *handle,
+ bbsink *next_sink);
+
+#endif
--
2.24.3 (Apple Git-128)
v2-0002-Add-basebackup_to_shell-contrib-module.patchapplication/octet-stream; name=v2-0002-Add-basebackup_to_shell-contrib-module.patchDownload
From 853c6fcfc425e2ece375f5bec73d94d000efb338 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 15 Feb 2022 11:24:00 -0500
Subject: [PATCH v2 2/2] Add 'basebackup_to_shell' contrib module.
As a demonstration of the sort of thing that can be done by adding a
custom backup target, this defines a 'shell' target which executes a
command defined by the system administrator. The command is executed
once for each tar archive generate by the backup and once for the
backup manifest, if any. Each time the command is executed, it
receives the contents of th file for which it is executed via standard
input.
The configured command can use %f to refer to the name of the archive
(e.g. base.tar, $TABLESPACE_OID.tar, backup_manifest) and %d to refer
to the target detail (pg_basebackup --target shell:DETAIL). A target
detail is required if %d appears in the configured command and
forbidden if it does not.
---
contrib/Makefile | 1 +
contrib/basebackup_to_shell/Makefile | 19 +
.../basebackup_to_shell/basebackup_to_shell.c | 419 ++++++++++++++++++
doc/src/sgml/basebackup-to-shell.sgml | 69 +++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
6 files changed, 510 insertions(+)
create mode 100644 contrib/basebackup_to_shell/Makefile
create mode 100644 contrib/basebackup_to_shell/basebackup_to_shell.c
create mode 100644 doc/src/sgml/basebackup-to-shell.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index e3e221308b..332b486ecc 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -10,6 +10,7 @@ SUBDIRS = \
auth_delay \
auto_explain \
basic_archive \
+ basebackup_to_shell \
bloom \
btree_gin \
btree_gist \
diff --git a/contrib/basebackup_to_shell/Makefile b/contrib/basebackup_to_shell/Makefile
new file mode 100644
index 0000000000..f31dfaae9c
--- /dev/null
+++ b/contrib/basebackup_to_shell/Makefile
@@ -0,0 +1,19 @@
+# contrib/basebackup_to_shell/Makefile
+
+MODULE_big = basebackup_to_shell
+OBJS = \
+ $(WIN32RES) \
+ basebackup_to_shell.o
+
+PGFILEDESC = "basebackup_to_shell - target basebackup to shell command"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/basebackup_to_shell
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/basebackup_to_shell/basebackup_to_shell.c b/contrib/basebackup_to_shell/basebackup_to_shell.c
new file mode 100644
index 0000000000..d82cb6d13f
--- /dev/null
+++ b/contrib/basebackup_to_shell/basebackup_to_shell.c
@@ -0,0 +1,419 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_to_shell.c
+ * target base backup files to a shell command
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * contrib/basebackup_to_shell/basebackup_to_shell.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "miscadmin.h"
+#include "replication/basebackup_target.h"
+#include "storage/fd.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+typedef struct bbsink_shell
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* User-supplied target detail string. */
+ char *target_detail;
+
+ /* Shell command pattern being used for this backup. */
+ char *shell_command;
+
+ /* The command that is currently running. */
+ char *current_command;
+
+ /* Pipe to the running command. */
+ FILE *pipe;
+} bbsink_shell;
+
+void _PG_init(void);
+
+static void *shell_check_detail(char *target, char *target_detail);
+static bbsink *shell_get_sink(bbsink *next_sink, void *detail_arg);
+
+static void bbsink_shell_begin_archive(bbsink *sink,
+ const char *archive_name);
+static void bbsink_shell_archive_contents(bbsink *sink, size_t len);
+static void bbsink_shell_end_archive(bbsink *sink);
+static void bbsink_shell_begin_manifest(bbsink *sink);
+static void bbsink_shell_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_shell_end_manifest(bbsink *sink);
+
+const bbsink_ops bbsink_shell_ops = {
+ .begin_backup = bbsink_forward_begin_backup,
+ .begin_archive = bbsink_shell_begin_archive,
+ .archive_contents = bbsink_shell_archive_contents,
+ .end_archive = bbsink_shell_end_archive,
+ .begin_manifest = bbsink_shell_begin_manifest,
+ .manifest_contents = bbsink_shell_manifest_contents,
+ .end_manifest = bbsink_shell_end_manifest,
+ .end_backup = bbsink_forward_end_backup,
+ .cleanup = bbsink_forward_cleanup
+};
+
+static char *shell_command = "";
+static char *shell_required_role = "";
+
+void
+_PG_init(void)
+{
+ DefineCustomStringVariable("basebackup_to_shell.command",
+ "Shell command to be executed for each backup file.",
+ NULL,
+ &shell_command,
+ "",
+ PGC_SIGHUP,
+ 0,
+ NULL, NULL, NULL);
+
+ DefineCustomStringVariable("basebackup_to_shell.required_role",
+ "Backup user must be a member of this role to use shell backup target.",
+ NULL,
+ &shell_required_role,
+ "",
+ PGC_SIGHUP,
+ 0,
+ NULL, NULL, NULL);
+
+ BaseBackupAddTarget("shell", shell_check_detail, shell_get_sink);
+}
+
+/*
+ * We choose to defer sanity sanity checking until shell_get_sink(), and so
+ * just pass the target detail through without doing anything. However, we do
+ * permissions checks here, before any real work has been done.
+ */
+static void *
+shell_check_detail(char *target, char *target_detail)
+{
+ if (shell_required_role[0] != '\0')
+ {
+ Oid roleid;
+
+ StartTransactionCommand();
+ roleid = get_role_oid(shell_required_role, true);
+ if (!is_member_of_role(GetUserId(), roleid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to use basebackup_to_shell")));
+ CommitTransactionCommand();
+ }
+
+ return target_detail;
+}
+
+/*
+ * Set up a bbsink to implement this base backup target.
+ *
+ * This is also a convenient place to sanity check that a target detail was
+ * given if and only if %d is present.
+ */
+static bbsink *
+shell_get_sink(bbsink *next_sink, void *detail_arg)
+{
+ bbsink_shell *sink;
+ bool has_detail_escape = false;
+ char *c;
+
+ /*
+ * Set up the bbsink.
+ *
+ * We remember the current value of basebackup_to_shell.shell_command to
+ * be certain that it can't change under us during the backup.
+ */
+ sink = palloc0(sizeof(bbsink_shell));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_shell_ops;
+ sink->base.bbs_next = next_sink;
+ sink->target_detail = detail_arg;
+ sink->shell_command = pstrdup(shell_command);
+
+ /* Reject an empty shell command. */
+ if (sink->shell_command[0] == '\0')
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("shell command for backup is not configured"));
+
+ /* Determine whether the shell command we're using contains %d. */
+ for (c = sink->shell_command; *c != '\0'; ++c)
+ {
+ if (c[0] == '%' && c[1] != '\0')
+ {
+ if (c[1] == 'd')
+ has_detail_escape = true;
+ ++c;
+ }
+ }
+
+ /* There should be a target detail if %d was used, and not otherwise. */
+ if (has_detail_escape && sink->target_detail == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a target detail is required because the configured command includes %%d"),
+ errhint("Try \"pg_basebackup --target shell:DETAIL ...\"")));
+ else if (!has_detail_escape && sink->target_detail != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a target detail is not permitted because the configured command does not include %%d")));
+
+ /*
+ * Since we're passing the string provided by the user to popen(), it will
+ * be interpreted by the shell, which is a potential security
+ * vulnerability, since the user invoking this module is not necessarily
+ * a superuser. To stay out of trouble, we must disallow any shell
+ * metacharacters here; to be conservative and keep things simple, we
+ * allow only alphanumerics.
+ */
+ if (sink->target_detail != NULL)
+ {
+ char *d;
+ bool scary = false;
+
+ for (d = sink->target_detail; *d != '\0'; ++d)
+ {
+ if (*d >= 'a' && *d <= 'z')
+ continue;
+ if (*d >= 'A' && *d <= 'Z')
+ continue;
+ if (*d >= '0' && *d <= '9')
+ continue;
+ scary = true;
+ break;
+ }
+
+ if (scary)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("target detail must contain only alphanumeric characters"));
+ }
+
+ return &sink->base;
+}
+
+/*
+ * Construct the exact shell command that we're actually going to run,
+ * making substitutions as appropriate for escape sequences.
+ */
+static char *
+shell_construct_command(char *base_command, const char *filename,
+ char *target_detail)
+{
+ StringInfoData buf;
+ char *c;
+
+ initStringInfo(&buf);
+ for (c = base_command; *c != '\0'; ++c)
+ {
+ /* Anything other than '%' is copied verbatim. */
+ if (*c != '%')
+ {
+ appendStringInfoChar(&buf, *c);
+ continue;
+ }
+
+ /* Any time we see '%' we eat the following character as well. */
+ ++c;
+
+ /*
+ * The following character determines what we insert here, or may
+ * cause us to throw an error.
+ */
+ if (*c == '%')
+ {
+ /* '%%' is replaced by a single '%' */
+ appendStringInfoChar(&buf, '%');
+ }
+ else if (*c == 'f')
+ {
+ /* '%f' is replaced by the filename */
+ appendStringInfoString(&buf, filename);
+ }
+ else if (*c == 'd')
+ {
+ /* '%d' is replaced by the target detail */
+ appendStringInfoString(&buf, target_detail);
+ }
+ else if (*c == '\0')
+ {
+ /* Incomplete escape sequence, expected a character afterward */
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("shell command ends unexpectedly after escape character \"%%\""));
+ }
+ else
+ {
+ /* Unknown escape sequence */
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("shell command contains unexpected escape sequence \"%c\"",
+ *c));
+ }
+ }
+
+ return buf.data;
+}
+
+/*
+ * Finish executing the shell command once all data has been written.
+ */
+static void
+shell_finish_command(bbsink_shell *sink)
+{
+ int pclose_rc;
+
+ /* There should be a command running. */
+ Assert(sink->current_command != NULL);
+ Assert(sink->pipe != NULL);
+
+ /* Close down the pipe we opened. */
+ pclose_rc = ClosePipeStream(sink->pipe);
+ if (pclose_rc == -1)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not close pipe to external command: %m")));
+ else if (pclose_rc != 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+ errmsg("shell command \"%s\" failed",
+ sink->current_command),
+ errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+ }
+
+ /* Clean up. */
+ sink->pipe = NULL;
+ pfree(sink->current_command);
+ sink->current_command = NULL;
+}
+
+/*
+ * Start up the shell command, substituting %f in for the current filename.
+ */
+static void
+shell_run_command(bbsink_shell *sink, const char *filename)
+{
+ /* There should not be anything already running. */
+ Assert(sink->current_command == NULL);
+ Assert(sink->pipe == NULL);
+
+ /* Construct a suitable command. */
+ sink->current_command = shell_construct_command(sink->shell_command,
+ filename,
+ sink->target_detail);
+
+ /* Run it. */
+ sink->pipe = OpenPipeStream(sink->current_command, PG_BINARY_W);
+}
+
+/*
+ * Send accumulated data to the running shell command.
+ */
+static void
+shell_send_data(bbsink_shell *sink, size_t len)
+{
+ /* There should be a command running. */
+ Assert(sink->current_command != NULL);
+ Assert(sink->pipe != NULL);
+
+ /* Try to write the data. */
+ if (fwrite(sink->base.bbs_buffer, len, 1, sink->pipe) != 1 ||
+ ferror(sink->pipe))
+ {
+ if (errno == EPIPE)
+ {
+ /*
+ * The error we're about to throw would shut down the command
+ * anyway, but we may get a more meaningful error message by
+ * doing this. If not, we'll fall through to the generic error
+ * below.
+ */
+ shell_finish_command(sink);
+ errno = EPIPE;
+ }
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to shell backup program: %m")));
+ }
+}
+
+/*
+ * At start of archive, start up the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_run_command(mysink, archive_name);
+ bbsink_forward_begin_archive(sink, archive_name);
+}
+
+/*
+ * Send archive contents to command's stdin and forward to next sink.
+ */
+static void
+bbsink_shell_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_send_data(mysink, len);
+ bbsink_forward_archive_contents(sink, len);
+}
+
+/*
+ * At end of archive, shut down the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_end_archive(bbsink *sink)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_finish_command(mysink);
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * At start of manifest, start up the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_begin_manifest(bbsink *sink)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_run_command(mysink, "backup_manifest");
+ bbsink_forward_begin_manifest(sink);
+}
+
+/*
+ * Send manifest contents to command's stdin and forward to next sink.
+ */
+static void
+bbsink_shell_manifest_contents(bbsink *sink, size_t len)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_send_data(mysink, len);
+ bbsink_forward_manifest_contents(sink, len);
+}
+
+/*
+ * At end of manifest, shut down the shell command and forward to next sink.
+ */
+static void
+bbsink_shell_end_manifest(bbsink *sink)
+{
+ bbsink_shell *mysink = (bbsink_shell *) sink;
+
+ shell_finish_command(mysink);
+ bbsink_forward_end_manifest(sink);
+}
diff --git a/doc/src/sgml/basebackup-to-shell.sgml b/doc/src/sgml/basebackup-to-shell.sgml
new file mode 100644
index 0000000000..f36f37e510
--- /dev/null
+++ b/doc/src/sgml/basebackup-to-shell.sgml
@@ -0,0 +1,69 @@
+<!-- doc/src/sgml/basebackup-to-shell.sgml -->
+
+<sect1 id="basebackup-to-shell" xreflabel="basebackup_to_shell">
+ <title>basebackup_to_shell</title>
+
+ <indexterm zone="basebackup-to-shell">
+ <primary>basebackup_to_shell</primary>
+ </indexterm>
+
+ <para>
+ <filename>basebackup_to_shell</filename> adds a custom basebackup target
+ called <literal>shell</literal>. This makes it possible to run
+ <literal>pg_basebackup --target=shell</literal> or, depending on how this
+ module is configured,
+ <literal>pg_basebackup --target=shell:DETAIL_STRING</literal>, and cause
+ a server command chosen by the server administrator to be executed for
+ each tar archive generated by the backup process. The command will receive
+ the contents of the archive via standard input.
+ </para>
+
+ <para>
+ This module is primarily intended as an example of how to create a new
+ backup targets via an extension module, but in some scenarios it may be
+ useful for its own sake.
+ In order to function, this module must be loaded via
+ <xref linkend="guc-shared-preload-libraries"/> or
+ <xref linkend="guc-local-preload-libraries"/>.
+ </para>
+
+ <sect2>
+ <title>Configuration Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <varname>basebackup_to_shell.command</varname> (<type>string</type>)
+ <indexterm>
+ <primary><varname>basebackup_to_shell.command</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ The command which the server should execute for each archive generated
+ by the backup process. If <literal>%f</literal> occurs in the command
+ string, it will be replaced by the name of the archive (e.g.
+ <literal>base.tar</literal>). If <literal>%d</literal> occurs in the
+ command string, it will be replaced by the target detail provided by
+ the user. A target detail is required if <literal>%d</literal> is
+ used in the command string, and prohibited otherwise. For security
+ reasons, it may contain only alphanumeric characters. If
+ <literal>%%</literal> occurs in the command string, it will be replaced
+ by a single <literal>%</literal>. If <literal>%</literal> occurs in
+ the command string followed by any other character or at the end of the
+ string, an error occurs.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+
+ <para>
+ Robert Haas <email>rhaas@postgresql.org</email>
+ </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index be9711c6f2..1e42ce1a7f 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -99,6 +99,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&amcheck;
&auth-delay;
&auto-explain;
+ &basebackup-to-shell;
&basic-archive;
&bloom;
&btree-gin;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 328cd1f378..fd853af01f 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -114,6 +114,7 @@
<!ENTITY auth-delay SYSTEM "auth-delay.sgml">
<!ENTITY auto-explain SYSTEM "auto-explain.sgml">
<!ENTITY basic-archive SYSTEM "basic-archive.sgml">
+<!ENTITY basebackup-to-shell SYSTEM "basebackup-to-shell.sgml">
<!ENTITY bloom SYSTEM "bloom.sgml">
<!ENTITY btree-gin SYSTEM "btree-gin.sgml">
<!ENTITY btree-gist SYSTEM "btree-gist.sgml">
--
2.24.3 (Apple Git-128)
On 2/15/22 6:48 PM, Jeevan Ladhe wrote:
Please find the attached updated version of patch for ZSTD server side
Thanks, Jeevan, I again tested with the attached patch, and as mentioned
the crash is fixed now.
also, I tested with different labels with gzip V/s zstd against data
directory size which is 29GB and found these results
====
./pg_basebackup -t server:/tmp/<directory>
--compress=server-zstd:<label> -Xnone -n -N --no-estimate-size -v
--compress=server-zstd:1 = compress directory size is 1.3GB
--compress=server-zstd:4 = compress directory size is 1.3GB
--compress=server-zstd:7 = compress directory size is 1.2GB
--compress=server-zstd:12 = compress directory size is 1.2GB
====
===
./pg_basebackup -t server:/tmp/<directooy>
--compress=server-gzip:<label> -Xnone -n -N --no-estimate-size -v
--compress=server-gzip:1 = compress directory size is 1.8GB
--compress=server-gzip:4 = compress directory size is 1.6GB
--compress=server-gzip:9 = compress directory size is 1.6GB
===
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
+++ b/configure
@@ -801,6 +805,7 @@ infodir
docdir
oldincludedir
includedir
+runstatedir
There's superfluous changes to ./configure unrelated to the changes in
configure.ac. Probably because you're using a different version of autotools,
or a vendor's patched copy. You can remove the changes with git checkout -p or
similar.
+++ b/src/backend/replication/basebackup_zstd.c
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+#else
This should have an return; like what's added by 71cbbbbe8 and 302612a6c.
Also, the parens() around errcode aren't needed since last year.
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 22);
+
+ if (compresslevel < 0 || compresslevel > 22)
+ ereport(ERROR,
This looks like dead code in assert builds.
If it's unreachable, it can be elog().
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
*the next sink ?
Does anyone plan to include this for pg15 ? If so, I think at least the WAL
compression should have support added too. I'd plan to rebase Michael's patch.
/messages/by-id/YNqWd2GSMrnqWIfx@paquier.xyz
--
Justin
Thanks Tushar for the testing.
I further worked on ZSTD and now have implemented client side
compression as well. Attached are the patches for both server-side and
client-side compression.
The patch 0001 is a server-side patch, and has not changed since the
last patch version - v10, but, just bumping the version number.
Patch 0002 is the client-side compression patch.
Regards,
Jeevan Ladhe
On Tue, 15 Feb 2022 at 22:24, tushar <tushar.ahuja@enterprisedb.com> wrote:
Show quoted text
On 2/15/22 6:48 PM, Jeevan Ladhe wrote:
Please find the attached updated version of patch for ZSTD server side
Thanks, Jeevan, I again tested with the attached patch, and as mentioned
the crash is fixed now.also, I tested with different labels with gzip V/s zstd against data
directory size which is 29GB and found these results====
./pg_basebackup -t server:/tmp/<directory>
--compress=server-zstd:<label> -Xnone -n -N --no-estimate-size -v--compress=server-zstd:1 = compress directory size is 1.3GB
--compress=server-zstd:4 = compress directory size is 1.3GB
--compress=server-zstd:7 = compress directory size is 1.2GB
--compress=server-zstd:12 = compress directory size is 1.2GB
=======
./pg_basebackup -t server:/tmp/<directooy>
--compress=server-gzip:<label> -Xnone -n -N --no-estimate-size -v--compress=server-gzip:1 = compress directory size is 1.8GB
--compress=server-gzip:4 = compress directory size is 1.6GB
--compress=server-gzip:9 = compress directory size is 1.6GB
===--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Attachments:
v11-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patchapplication/octet-stream; name=v11-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patchDownload
From a494ec33c2b72176afd3f7decfe571c969133012 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Tue, 15 Feb 2022 18:45:52 +0530
Subject: [PATCH 1/2] Add a ZSTD compression method for server side
compression.
This patch introduces --compress=server-zstd[:LEVEL]
Add tap test.
Add config option --with-zstd.
Add documentation for ZSTD option.
Add pg_basebackup help for ZSTD option.
Example:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-zstd:4
---
configure | 295 +++++++++++++++++++++-
configure.ac | 33 +++
doc/src/sgml/protocol.sgml | 5 +-
doc/src/sgml/ref/pg_basebackup.sgml | 38 +--
src/Makefile.global.in | 1 +
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_zstd.c | 294 +++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 18 +-
src/bin/pg_basebackup/pg_receivewal.c | 4 +
src/bin/pg_basebackup/walmethods.h | 1 +
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 9 +
src/include/pg_config.h.in | 6 +
src/include/replication/basebackup_sink.h | 1 +
15 files changed, 686 insertions(+), 28 deletions(-)
create mode 100644 src/backend/replication/basebackup_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/configure b/configure
index 9305555658..fc83c17c68 100755
--- a/configure
+++ b/configure
@@ -650,6 +650,7 @@ CFLAGS_ARMV8_CRC32C
CFLAGS_SSE42
have_win32_dbghelp
LIBOBJS
+ZSTD
LZ4
UUID_LIBS
LDAP_LIBS_BE
@@ -700,6 +701,9 @@ with_gnu_ld
LD
LDFLAGS_SL
LDFLAGS_EX
+ZSTD_LIBS
+ZSTD_CFLAGS
+with_zstd
LZ4_LIBS
LZ4_CFLAGS
with_lz4
@@ -801,6 +805,7 @@ infodir
docdir
oldincludedir
includedir
+runstatedir
localstatedir
sharedstatedir
sysconfdir
@@ -869,6 +874,7 @@ with_libxslt
with_system_tzdata
with_zlib
with_lz4
+with_zstd
with_gnu_ld
with_ssl
with_openssl
@@ -898,6 +904,8 @@ XML2_CFLAGS
XML2_LIBS
LZ4_CFLAGS
LZ4_LIBS
+ZSTD_CFLAGS
+ZSTD_LIBS
LDFLAGS_EX
LDFLAGS_SL
PERL
@@ -942,6 +950,7 @@ datadir='${datarootdir}'
sysconfdir='${prefix}/etc'
sharedstatedir='${prefix}/com'
localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
includedir='${prefix}/include'
oldincludedir='/usr/include'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1194,6 +1203,15 @@ do
| -silent | --silent | --silen | --sile | --sil)
silent=yes ;;
+ -runstatedir | --runstatedir | --runstatedi | --runstated \
+ | --runstate | --runstat | --runsta | --runst | --runs \
+ | --run | --ru | --r)
+ ac_prev=runstatedir ;;
+ -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+ | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+ | --run=* | --ru=* | --r=*)
+ runstatedir=$ac_optarg ;;
+
-sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
ac_prev=sbindir ;;
-sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1331,7 +1349,7 @@ fi
for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
- libdir localedir mandir
+ libdir localedir mandir runstatedir
do
eval ac_val=\$$ac_var
# Remove trailing slashes.
@@ -1484,6 +1502,7 @@ Fine tuning of the installation directories:
--sysconfdir=DIR read-only single-machine data [PREFIX/etc]
--sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
--localstatedir=DIR modifiable single-machine data [PREFIX/var]
+ --runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run]
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -1577,6 +1596,7 @@ Optional Packages:
use system time zone data in DIR
--without-zlib do not use Zlib
--with-lz4 build with LZ4 support
+ --with-zstd build with ZSTD support
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
--with-ssl=LIB use LIB for SSL/TLS support (openssl)
--with-openssl obsolete spelling of --with-ssl=openssl
@@ -1606,6 +1626,8 @@ Some influential environment variables:
XML2_LIBS linker flags for XML2, overriding pkg-config
LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config
LZ4_LIBS linker flags for LZ4, overriding pkg-config
+ ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config
+ ZSTD_LIBS linker flags for ZSTD, overriding pkg-config
LDFLAGS_EX extra linker flags for linking executables only
LDFLAGS_SL extra linker flags for linking shared libraries only
PERL Perl program
@@ -9034,6 +9056,146 @@ fi
done
fi
+#
+# ZSTD
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with ZSTD support" >&5
+$as_echo_n "checking whether to build with ZSTD support... " >&6; }
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+ withval=$with_zstd;
+ case $withval in
+ yes)
+
+$as_echo "#define USE_ZSTD 1" >>confdefs.h
+
+ ;;
+ no)
+ :
+ ;;
+ *)
+ as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+ ;;
+ esac
+
+else
+ with_zstd=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_zstd" >&5
+$as_echo "$with_zstd" >&6; }
+
+
+if test "$with_zstd" = yes; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
+$as_echo_n "checking for libzstd... " >&6; }
+
+if test -n "$ZSTD_CFLAGS"; then
+ pkg_cv_ZSTD_CFLAGS="$ZSTD_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+if test -n "$ZSTD_LIBS"; then
+ pkg_cv_ZSTD_LIBS="$ZSTD_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+ _pkg_short_errors_supported=yes
+else
+ _pkg_short_errors_supported=no
+fi
+ if test $_pkg_short_errors_supported = yes; then
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libzstd" 2>&1`
+ else
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libzstd" 2>&1`
+ fi
+ # Put the nasty error message in config.log where it belongs
+ echo "$ZSTD_PKG_ERRORS" >&5
+
+ as_fn_error $? "Package requirements (libzstd) were not met:
+
+$ZSTD_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+ ZSTD_CFLAGS=$pkg_cv_ZSTD_CFLAGS
+ ZSTD_LIBS=$pkg_cv_ZSTD_LIBS
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -13130,6 +13292,56 @@ fi
fi
+if test "$with_zstd" = yes ; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+/* Override any GCC internal prototype to avoid an error.
+ Use char because int might match the return type of a GCC
+ builtin and then its argument prototype would still apply. */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+ ;
+ return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+ ac_cv_lib_zstd_ZSTD_compress=yes
+else
+ ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+ LIBS="-lzstd $LIBS"
+
+else
+ as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -13904,6 +14116,77 @@ done
fi
+if test -z "$ZSTD"; then
+ for ac_prog in zstd
+do
+ # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+$as_echo_n "checking for $ac_word... " >&6; }
+if ${ac_cv_path_ZSTD+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ case $ZSTD in
+ [\\/]* | ?:[\\/]*)
+ ac_cv_path_ZSTD="$ZSTD" # Let the user override the test with a path.
+ ;;
+ *)
+ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+ IFS=$as_save_IFS
+ test -z "$as_dir" && as_dir=.
+ for ac_exec_ext in '' $ac_executable_extensions; do
+ if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
+ ac_cv_path_ZSTD="$as_dir/$ac_word$ac_exec_ext"
+ $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
+ break 2
+ fi
+done
+ done
+IFS=$as_save_IFS
+
+ ;;
+esac
+fi
+ZSTD=$ac_cv_path_ZSTD
+if test -n "$ZSTD"; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+else
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+
+ test -n "$ZSTD" && break
+done
+
+else
+ # Report the value of ZSTD in configure's output in all cases.
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD" >&5
+$as_echo_n "checking for ZSTD... " >&6; }
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+fi
+
+if test "$with_zstd" = yes; then
+ for ac_header in zstd.h
+do :
+ ac_fn_c_check_header_mongrel "$LINENO" "zstd.h" "ac_cv_header_zstd_h" "$ac_includes_default"
+if test "x$ac_cv_header_zstd_h" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_ZSTD_H 1
+_ACEOF
+
+else
+ as_fn_error $? "zstd.h header file is required for ZSTD" "$LINENO" 5
+fi
+
+done
+
+fi
+
if test "$with_gssapi" = yes ; then
for ac_header in gssapi/gssapi.h
do :
@@ -15307,7 +15590,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15353,7 +15636,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15377,7 +15660,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15422,7 +15705,7 @@ else
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
@@ -15446,7 +15729,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
We can't simply define LARGE_OFF_T to be 9223372036854775807,
since some C++ compilers masquerading as C compilers
incorrectly reject 9223372036854775807. */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
&& LARGE_OFF_T % 2147483647 == 1)
? 1 : -1];
diff --git a/configure.ac b/configure.ac
index 16167329fc..729b23fbea 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1056,6 +1056,30 @@ if test "$with_lz4" = yes; then
done
fi
+#
+# ZSTD
+#
+AC_MSG_CHECKING([whether to build with ZSTD support])
+PGAC_ARG_BOOL(with, zstd, no, [build with ZSTD support],
+ [AC_DEFINE([USE_ZSTD], 1, [Define to 1 to build with ZSTD support. (--with-zstd)])])
+AC_MSG_RESULT([$with_zstd])
+AC_SUBST(with_zstd)
+
+if test "$with_zstd" = yes; then
+ PKG_CHECK_MODULES(ZSTD, libzstd)
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -1325,6 +1349,10 @@ if test "$with_lz4" = yes ; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
+if test "$with_zstd" = yes ; then
+ AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])])
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -1490,6 +1518,11 @@ if test "$with_lz4" = yes; then
AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])
fi
+PGAC_PATH_PROGS(ZSTD, zstd)
+if test "$with_zstd" = yes; then
+ AC_CHECK_HEADERS(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])])
+fi
+
if test "$with_gssapi" = yes ; then
AC_CHECK_HEADERS(gssapi/gssapi.h, [],
[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1c5ab00879..c13d25051c 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2725,7 +2725,7 @@ The commands accepted in replication mode are:
<para>
Instructs the server to compress the backup using the specified
method. Currently, the supported methods are <literal>gzip</literal>
- and <literal>lz4</literal>.
+ <literal>lz4</literal>, and <literal>zstd</literal>.
</para>
</listitem>
</varlistentry>
@@ -2737,7 +2737,8 @@ The commands accepted in replication mode are:
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
For <literal>gzip</literal> the value should be an integer between 1
- and 9, and for <literal>lz4</literal> it should be between 1 and 12.
+ and 9, for <literal>lz4</literal> between 1 and 12, and for
+ <literal>zstd</literal> it should be between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd1..4cf28a2a61 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,30 +417,32 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to <literal>gzip</literal> or
- <literal>lz4</literal>, or <literal>none</literal> for no
- compression. A compression level can be optionally specified, by
- appending the level number after a colon (<literal>:</literal>). If no
- level is specified, the default compression level will be used. If
- only a level is specified without mentioning an algorithm,
- <literal>gzip</literal> compression will be used if the level is
- greater than 0, and no compression will be used if the level is 0.
- </para>
- <para>
- When the tar format is used with <literal>gzip</literal> or
- <literal>lz4</literal>, the suffix <filename>.gz</filename> or
- <filename>.lz4</filename> will automatically be added to all tar
- filenames. When the plain format is used, client-side compression may
- not be specified, but it is still possible to request server-side
- compression. If this is done, the server will compress the backup for
- transmission, and the client will decompress and extract it.
+ The compression method can be set to <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> for no compression. A compression level can
+ optionally be specified, by appending the level number after a colon
+ (<literal>:</literal>). If no level is specified, the default
+ compression level will be used. If only a level is specified without
+ mentioning an algorithm, <literal>gzip</literal> compression will be
+ used if the level is greater than 0, and no compression will be used if
+ the level is 0.
+ </para>
+ <para>
+ When the tar format is used with <literal>gzip</literal>,
+ <literal>lz4</literal>, or <literal>zstd</literal>, the suffix
+ <filename>.gz</filename>, <filename>.lz4</filename>, or
+ <filename>.zst</filename> respectively will be automatically added to
+ all tar filenames. When the plain format is used, client-side
+ compression may not be specified, but it is still possible to request
+ server-side compression. If this is done, the server will compress the
+ backup for transmission, and the client will decompress and extract it.
</para>
<para>
When this option is used in combination with
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
compression is selected, but will not be compressed if server-side
- compresion or LZ4 compresion is selected.
+ compression, LZ4, or ZSTD compression is selected.
</para>
</listitem>
</varlistentry>
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 9dcd54fcbd..c980444233 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -351,6 +351,7 @@ XGETTEXT = @XGETTEXT@
GZIP = gzip
BZIP2 = bzip2
LZ4 = @LZ4@
+ZSTD = @ZSTD@
DOWNLOAD = wget -O $@ --no-use-server-timestamps
#DOWNLOAD = curl -o $@
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..2e6de7007f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -20,6 +20,7 @@ OBJS = \
basebackup_copy.o \
basebackup_gzip.o \
basebackup_lz4.o \
+ basebackup_zstd.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0bf28b55d7..2378ce5c5e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
{
BACKUP_COMPRESSION_NONE,
BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
} basebackup_compression_type;
typedef struct
@@ -906,6 +907,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
else if (strcmp(optval, "lz4") == 0)
opt->compression = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(optval, "zstd") == 0)
+ opt->compression = BACKUP_COMPRESSION_ZSTD;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1026,6 +1029,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
sink = bbsink_gzip_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
+ sink = bbsink_zstd_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
new file mode 100644
index 0000000000..d99b3698f6
--- /dev/null
+++ b/src/backend/replication/basebackup_zstd.c
@@ -0,0 +1,294 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_zstd.c
+ * Basebackup sink implementing zstd compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbsink_zstd
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level */
+ int compresslevel;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbsink_zstd;
+
+static void bbsink_zstd_begin_backup(bbsink *sink);
+static void bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_zstd_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_zstd_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_zstd_end_archive(bbsink *sink);
+static void bbsink_zstd_cleanup(bbsink *sink);
+static void bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+const bbsink_ops bbsink_zstd_ops = {
+ .begin_backup = bbsink_zstd_begin_backup,
+ .begin_archive = bbsink_zstd_begin_archive,
+ .archive_contents = bbsink_zstd_archive_contents,
+ .end_archive = bbsink_zstd_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_zstd_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_zstd_end_backup,
+ .cleanup = bbsink_zstd_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs zstd compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+#else
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+ Assert(compresslevel >= 0 && compresslevel <= 22);
+
+ if (compresslevel < 0 || compresslevel > 22)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_zstd));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_zstd_begin_backup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t output_buffer_bound;
+
+ mysink->cctx = ZSTD_createCCtx();
+ if (!mysink->cctx)
+ elog(ERROR, "could not create zstd compression context");
+
+ ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Make sure that the next sink's bbs_buffer is big enough to accommodate
+ * the compressed input buffer.
+ */
+ output_buffer_bound = ZSTD_compressBound(mysink->base.bbs_buffer_length);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ char *zstd_archive_name;
+
+ /*
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they would stick
+ * around as we are resetting with option ZSTD_reset_session_only.
+ */
+ ZSTD_CCtx_reset(mysink->cctx, ZSTD_reset_session_only);
+
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ /* Add ".zst" to the archive name. */
+ zstd_archive_name = psprintf("%s.zst", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, zstd_archive_name);
+ pfree(zstd_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_zstd_end_archive() is invoked.
+ */
+static void
+bbsink_zstd_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ ZSTD_inBuffer inBuf = {mysink->base.bbs_buffer, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx, &mysink->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * There might be some data inside zstd's internal buffers; we need to get that
+ * flushed out, also end the zstd frame and then get that forwarded to the
+ * successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_zstd_end_archive(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx,
+ &mysink->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next sink. */
+ if (mysink->zstd_outBuf.pos > 0)
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Free the resources and context.
+ */
+static void
+bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+
+ bbsink_forward_end_backup(sink, endptr, endtli);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ */
+static void
+bbsink_zstd_cleanup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context if not already released. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0003b59615..3adb3a3845 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress={[{client|server}-]{gzip|lz4|zstd}}[:LEVEL]|none}\n"
" compress tar output with given compression method or level\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -1023,6 +1023,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1970,6 +1975,9 @@ BaseBackup(void)
case COMPRESSION_LZ4:
compressmethodstr = "lz4";
break;
+ case COMPRESSION_ZSTD:
+ compressmethodstr = "zstd";
+ break;
default:
Assert(false);
break;
@@ -2819,6 +2827,14 @@ main(int argc, char **argv)
exit(1);
}
break;
+ case COMPRESSION_ZSTD:
+ if (compresslevel > 22)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 22",
+ compresslevel, "zstd");
+ exit(1);
+ }
+ break;
}
/*
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ccb215c398..9b7656c692 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -904,6 +904,10 @@ main(int argc, char **argv)
exit(1);
#endif
break;
+ case COMPRESSION_ZSTD:
+ pg_log_error("compression with %s is not yet supported", "ZSTD");
+ exit(1);
+
}
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 2dfb353baa..ec54019cfc 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -24,6 +24,7 @@ typedef enum
{
COMPRESSION_GZIP,
COMPRESSION_LZ4,
+ COMPRESSION_ZSTD,
COMPRESSION_NONE
} WalCompressionMethod;
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 851233a6e0..596df15118 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -10,6 +10,7 @@ export TAR
# name.
export GZIP_PROGRAM=$(GZIP)
export LZ4=$(LZ4)
+export ZSTD=$(ZSTD)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index 6927ca4c74..1ccc6cb9df
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -43,6 +43,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d', '-m'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
@@ -108,6 +116,7 @@ for my $tc (@test_configuration)
# Cleanup.
unlink($backup_path . '/backup_manifest');
unlink($backup_path . '/base.tar');
+ unlink($backup_path . '/' . $tc->{'backup_archive'});
rmtree($extract_path);
}
}
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 28a1f0e9f0..26e373e9f7 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -325,6 +325,9 @@
/* Define to 1 if you have the `lz4' library (-llz4). */
#undef HAVE_LIBLZ4
+/* Define to 1 if you have the `zstd' library (-lzstd). */
+#undef HAVE_LIBZSTD
+
/* Define to 1 if you have the `m' library (-lm). */
#undef HAVE_LIBM
@@ -367,6 +370,9 @@
/* Define to 1 if you have the <lz4.h> header file. */
#undef HAVE_LZ4_H
+/* Define to 1 if you have the <zstd.h> header file. */
+#undef HAVE_ZSTD_H
+
/* Define to 1 if you have the <mbarrier.h> header file. */
#undef HAVE_MBARRIER_H
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a3f8d37258..a7f16758a4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
v11-0002-ZSTD-CLIENT-compression-WIP-working.patchapplication/octet-stream; name=v11-0002-ZSTD-CLIENT-compression-WIP-working.patchDownload
From 7970d29ded947b0ec3a564ef7be4e63c6f3c3537 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Tue, 15 Feb 2022 20:41:56 +0530
Subject: [PATCH 2/2] ZSTD-CLIENT-compression: WIP working
Adds tap test as well.
---
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 2 +
src/bin/pg_basebackup/bbstreamer_zstd.c | 213 ++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 28 ++-
src/bin/pg_verifybackup/t/010_client_untar.pl | 8 +
src/tools/msvc/Mkvcbuild.pm | 1 +
6 files changed, 251 insertions(+), 2 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/010_client_untar.pl
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 1d0db4f9d0..0035ebcef5 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -44,6 +44,7 @@ BBOBJS = \
bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_lz4.o \
+ bbstreamer_zstd.o \
bbstreamer_tar.o
all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index c2de77bacc..bfc624a863 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -209,6 +209,8 @@ extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
int compresslevel);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
+ int compresslevel);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
new file mode 100644
index 0000000000..d2e7dee136
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -0,0 +1,213 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_zstd.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_zstd.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+
+#ifdef HAVE_LIBZSTD
+
+#define OUT_BUF_SIZE (1024 * 8)
+
+typedef struct bbstreamer_zstd_frame
+{
+ bbstreamer base;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbstreamer_zstd_frame;
+
+static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
+ .content = bbstreamer_zstd_compressor_content,
+ .finalize = bbstreamer_zstd_compressor_finalize,
+ .free = bbstreamer_zstd_compressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs zstd compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+ size_t compressed_bound;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_compressor_ops;
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, OUT_BUF_SIZE);
+
+ streamer->cctx = ZSTD_createCCtx();
+ if (!streamer->cctx)
+ pg_log_error("could not create zstd compression context");
+
+ /* Initialize stream compression preferences */
+ ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compresslevel);
+
+ /*
+ * Find out the compression bound, it specifies the minimum destination
+ * capacity required in worst case for the success of compression operation
+ * based on a given source size and parameters.
+ */
+ compressed_bound = ZSTD_compressBound(streamer->base.bbs_buffer.maxlen);
+
+ /* Enlarge buffer if it falls short of compression bound. */
+ if (streamer->base.bbs_buffer.maxlen <= compressed_bound)
+ enlargeStringInfo(&streamer->base.bbs_buffer, compressed_bound);
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support zstd compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Compress the input data to output buffer.
+ *
+ * Find out the compression bound based on input data length for each
+ * invocation to make sure that output buffer has enough capacity to
+ * accommodate the compressed data. In case if the output buffer
+ * capacity falls short of compression bound then forward the content
+ * of output buffer to next streamer and empty the buffer.
+ */
+static void
+bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx, &mystreamer->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx,
+ &mystreamer->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next sink. */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeCCtx(mystreamer->cctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 3adb3a3845..81ce73a5fd 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1023,6 +1023,16 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ }
+ else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
{
*methodres = COMPRESSION_ZSTD;
@@ -1146,7 +1156,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool inject_manifest;
bool is_tar,
is_tar_gz,
- is_tar_lz4;
+ is_tar_lz4,
+ is_tar_zstd;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1169,6 +1180,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_lz4 = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+ /* Is this a ZSTD archive? */
+ is_tar_zstd = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1178,7 +1193,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4)
+ if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4
+ && !is_tar_zstd)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1250,6 +1266,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_lz4_compressor_new(streamer,
compresslevel);
}
+ else if (compressmethod == COMPRESSION_ZSTD)
+ {
+ strlcat(archive_filename, ".zst", sizeof(archive_filename));
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
+ streamer = bbstreamer_zstd_compressor_new(streamer,
+ compresslevel);
+ }
else
{
Assert(false); /* not reachable */
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
old mode 100644
new mode 100755
index 3616529390..c2a6161be6
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -42,6 +42,14 @@ my @test_configuration = (
'decompress_flags' => [ '-d' ],
'output_file' => 'base.tar',
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:5'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index bab81bd459..901e755d01 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -380,6 +380,7 @@ sub mkvcbuild
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_gzip.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_inject.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_lz4.c');
+ $pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_zstd.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_tar.c');
$pgbasebackup->AddLibrary('ws2_32.lib');
--
2.25.1
On Tue, Feb 15, 2022 at 12:59 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
There's superfluous changes to ./configure unrelated to the changes in
configure.ac. Probably because you're using a different version of autotools,
or a vendor's patched copy. You can remove the changes with git checkout -p or
similar.
I noticed this already and fixed it in the version of the patch I
posted on the other thread.
+++ b/src/backend/replication/basebackup_zstd.c +bbsink * +bbsink_zstd_new(bbsink *next, int compresslevel) +{ +#ifndef HAVE_LIBZSTD + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("zstd compression is not supported by this build"))); +#elseThis should have an return; like what's added by 71cbbbbe8 and 302612a6c.
Also, the parens() around errcode aren't needed since last year.
The parens are still acceptable style, though. The return I guess is needed.
+ bbsink_zstd *sink; + + Assert(next != NULL); + Assert(compresslevel >= 0 && compresslevel <= 22); + + if (compresslevel < 0 || compresslevel > 22) + ereport(ERROR,This looks like dead code in assert builds.
If it's unreachable, it can be elog().
Actually, the right thing to do here is remove the assert, I think. I
don't believe that the code is unreachable. If I'm wrong and it is
unreachable then the test-and-ereport should be removed.
+ * Compress the input data to the output buffer until we run out of input + * data. Each time the output buffer falls below the compression bound for + * the input buffer, invoke the archive_contents() method for then next sink.*the next sink ?
Yeah.
Does anyone plan to include this for pg15 ? If so, I think at least the WAL
compression should have support added too. I'd plan to rebase Michael's patch.
/messages/by-id/YNqWd2GSMrnqWIfx@paquier.xyz
Yes, I'd like to get this into PG15. It's very similar to the LZ4
compression support which was already committed, so it feels like
finishing it up and including it in the release makes a lot of sense.
I'm not against the idea of using ZSTD in other places where it makes
sense as well, but I think that's a separate issue from this patch. As
far as I'm concerned, either basebackup compression with ZSTD or WAL
compression with ZSTD could be committed even if the other is not, and
I plan to spend my time on this project, not that project. However, if
you're saying you want to work on the WAL compression stuff, I've got
no objection to that.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2022-Feb-14, Robert Haas wrote:
A more consistent way of writing the supported syntax would be like this:
-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:-Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}
This is hard to interpret for humans though because of the nested
brackets and braces. It gets considerably easier if you split it in
separate variants:
-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
-Z, --compress=LEVEL
-Z, --compress=none
compress tar output with given compression method or level
or, if you choose to leave the level-only variant undocumented, then
-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
-Z, --compress=none
compress tar output with given compression method or level
There still are some nested brackets and braces, but the scope is
reduced enough that interpreting seems quite a bit simpler.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
This is hard to interpret for humans though because of the nested
brackets and braces. It gets considerably easier if you split it in
separate variants:-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
-Z, --compress=LEVEL
-Z, --compress=none
compress tar output with given compression method or levelor, if you choose to leave the level-only variant undocumented, then
-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
-Z, --compress=none
compress tar output with given compression method or levelThere still are some nested brackets and braces, but the scope is
reduced enough that interpreting seems quite a bit simpler.
I could go for that. I'm also just noticing that "none" is not really
a compression method or level, and the statement that it can only
compress "tar" output is no longer correct, because server-side
compression can be used together with -Fp. So maybe we should change
the sentence afterward to something a bit more generic, like "specify
whether and how to compress the backup".
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Everyone,
So, I went ahead and have now also implemented client side decompression
for zstd.
Robert separated[1]/messages/by-id/CA+TgmobRisF-9ocqYDcMng6iSijGj1EZX99PgXA=3VVbWuahog@mail.gmail.com the ZSTD configure switch from my original patch
of server side compression and also added documentation related to
the switch. I have included that patch here in the patch series for
simplicity.
The server side compression patch
0002-ZSTD-add-server-side-compression-support.patch has also taken care
of Justin Pryzby's comments[2]/messages/by-id/20220215175944.GY31460@telsasoft.com. Also, made changes to pg_basebackup help
as suggested by Álvaro Herrera.
[1]: /messages/by-id/CA+TgmobRisF-9ocqYDcMng6iSijGj1EZX99PgXA=3VVbWuahog@mail.gmail.com
/messages/by-id/CA+TgmobRisF-9ocqYDcMng6iSijGj1EZX99PgXA=3VVbWuahog@mail.gmail.com
[2]: /messages/by-id/20220215175944.GY31460@telsasoft.com
/messages/by-id/20220215175944.GY31460@telsasoft.com
Regards,
Jeevan Ladhe
On Wed, 16 Feb 2022 at 21:46, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:This is hard to interpret for humans though because of the nested
brackets and braces. It gets considerably easier if you split it in
separate variants:-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
-Z, --compress=LEVEL
-Z, --compress=none
compress tar output with given compressionmethod or level
or, if you choose to leave the level-only variant undocumented, then
-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
-Z, --compress=none
compress tar output with given compressionmethod or level
There still are some nested brackets and braces, but the scope is
reduced enough that interpreting seems quite a bit simpler.I could go for that. I'm also just noticing that "none" is not really
a compression method or level, and the statement that it can only
compress "tar" output is no longer correct, because server-side
compression can be used together with -Fp. So maybe we should change
the sentence afterward to something a bit more generic, like "specify
whether and how to compress the backup".--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v12-0003-ZSTD-add-client-side-compression-support.patchapplication/octet-stream; name=v12-0003-ZSTD-add-client-side-compression-support.patchDownload
From 7edb4f420982be174478666defd0dadab31362ae Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed, 16 Feb 2022 22:22:27 +0530
Subject: [PATCH 3/4] ZSTD: add client-side compression support.
ZSTD compression can now be performed on the client using
pg_basebackup -Ft --compress client-zstd[:LEVEL].
Example:
pg_basebackup -D /tmp/zstd_client -Ft -Xnone --compress=client-zstd
---
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 2 +
src/bin/pg_basebackup/bbstreamer_zstd.c | 202 ++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 28 ++-
src/bin/pg_verifybackup/t/010_client_untar.pl | 8 +
src/tools/msvc/Mkvcbuild.pm | 1 +
6 files changed, 240 insertions(+), 2 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/010_client_untar.pl
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 1d0db4f9d0..0035ebcef5 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -44,6 +44,7 @@ BBOBJS = \
bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_lz4.o \
+ bbstreamer_zstd.o \
bbstreamer_tar.o
all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index c2de77bacc..bfc624a863 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -209,6 +209,8 @@ extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
int compresslevel);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
+ int compresslevel);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
new file mode 100644
index 0000000000..0b20267cf4
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -0,0 +1,202 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_zstd.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_zstd.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbstreamer_zstd_frame
+{
+ bbstreamer base;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbstreamer_zstd_frame;
+
+static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
+ .content = bbstreamer_zstd_compressor_content,
+ .finalize = bbstreamer_zstd_compressor_finalize,
+ .free = bbstreamer_zstd_compressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs zstd compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_compressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, ZSTD_DStreamOutSize());
+
+ streamer->cctx = ZSTD_createCCtx();
+ if (!streamer->cctx)
+ pg_log_error("could not create zstd compression context");
+
+ /* Initialize stream compression preferences */
+ ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compresslevel);
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support zstd compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Compress the input data to output buffer.
+ *
+ * Find out the compression bound based on input data length for each
+ * invocation to make sure that output buffer has enough capacity to
+ * accommodate the compressed data. In case if the output buffer
+ * capacity falls short of compression bound then forward the content
+ * of output buffer to next streamer and empty the buffer.
+ */
+static void
+bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the output buffer is not left with enough space, send the
+ * compressed bytes to the next streamer, and empty the buffer.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx, &mystreamer->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the output buffer is not left with enough space, send the
+ * compressed bytes to the next streamer, and empty the buffer.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx,
+ &mystreamer->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next streamer. */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeCCtx(mystreamer->cctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7202a5eae7..7ba752c1c9 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1023,6 +1023,16 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ }
+ else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
{
*methodres = COMPRESSION_ZSTD;
@@ -1146,7 +1156,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool inject_manifest;
bool is_tar,
is_tar_gz,
- is_tar_lz4;
+ is_tar_lz4,
+ is_tar_zstd;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1169,6 +1180,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_lz4 = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+ /* Is this a ZSTD archive? */
+ is_tar_zstd = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1178,7 +1193,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4)
+ if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4
+ && !is_tar_zstd)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1250,6 +1266,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_lz4_compressor_new(streamer,
compresslevel);
}
+ else if (compressmethod == COMPRESSION_ZSTD)
+ {
+ strlcat(archive_filename, ".zst", sizeof(archive_filename));
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
+ streamer = bbstreamer_zstd_compressor_new(streamer,
+ compresslevel);
+ }
else
{
Assert(false); /* not reachable */
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
old mode 100644
new mode 100755
index 3616529390..c2a6161be6
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -42,6 +42,14 @@ my @test_configuration = (
'decompress_flags' => [ '-d' ],
'output_file' => 'base.tar',
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:5'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index bab81bd459..901e755d01 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -380,6 +380,7 @@ sub mkvcbuild
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_gzip.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_inject.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_lz4.c');
+ $pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_zstd.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_tar.c');
$pgbasebackup->AddLibrary('ws2_32.lib');
--
2.25.1
v2-0001-Add-support-for-building-with-ZSTD.patchapplication/octet-stream; name=v2-0001-Add-support-for-building-with-ZSTD.patchDownload
From c46df1ebef3000227251a7870d62fa49e1822e2c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 16 Feb 2022 10:36:36 -0500
Subject: [PATCH v2] Add support for building with ZSTD.
This commit doesn't actually add anything that uses ZSTD; that will be
done separately. It just puts the basic infrastructure into place.
Jeevan Ladhe and Robert Haas
---
configure | 271 ++++++++++++++++++++++++++++++
configure.ac | 33 ++++
doc/src/sgml/install-windows.sgml | 9 +
doc/src/sgml/installation.sgml | 9 +
src/Makefile.global.in | 1 +
src/include/pg_config.h.in | 9 +
src/tools/msvc/Solution.pm | 12 ++
src/tools/msvc/config_default.pl | 1 +
8 files changed, 345 insertions(+)
diff --git a/configure b/configure
index 9305555658..f07f689f1a 100755
--- a/configure
+++ b/configure
@@ -650,6 +650,7 @@ CFLAGS_ARMV8_CRC32C
CFLAGS_SSE42
have_win32_dbghelp
LIBOBJS
+ZSTD
LZ4
UUID_LIBS
LDAP_LIBS_BE
@@ -700,6 +701,9 @@ with_gnu_ld
LD
LDFLAGS_SL
LDFLAGS_EX
+ZSTD_LIBS
+ZSTD_CFLAGS
+with_zstd
LZ4_LIBS
LZ4_CFLAGS
with_lz4
@@ -869,6 +873,7 @@ with_libxslt
with_system_tzdata
with_zlib
with_lz4
+with_zstd
with_gnu_ld
with_ssl
with_openssl
@@ -898,6 +903,8 @@ XML2_CFLAGS
XML2_LIBS
LZ4_CFLAGS
LZ4_LIBS
+ZSTD_CFLAGS
+ZSTD_LIBS
LDFLAGS_EX
LDFLAGS_SL
PERL
@@ -1577,6 +1584,7 @@ Optional Packages:
use system time zone data in DIR
--without-zlib do not use Zlib
--with-lz4 build with LZ4 support
+ --with-zstd build with ZSTD support
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
--with-ssl=LIB use LIB for SSL/TLS support (openssl)
--with-openssl obsolete spelling of --with-ssl=openssl
@@ -1606,6 +1614,8 @@ Some influential environment variables:
XML2_LIBS linker flags for XML2, overriding pkg-config
LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config
LZ4_LIBS linker flags for LZ4, overriding pkg-config
+ ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config
+ ZSTD_LIBS linker flags for ZSTD, overriding pkg-config
LDFLAGS_EX extra linker flags for linking executables only
LDFLAGS_SL extra linker flags for linking shared libraries only
PERL Perl program
@@ -9034,6 +9044,146 @@ fi
done
fi
+#
+# ZSTD
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with ZSTD support" >&5
+$as_echo_n "checking whether to build with ZSTD support... " >&6; }
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+ withval=$with_zstd;
+ case $withval in
+ yes)
+
+$as_echo "#define USE_ZSTD 1" >>confdefs.h
+
+ ;;
+ no)
+ :
+ ;;
+ *)
+ as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+ ;;
+ esac
+
+else
+ with_zstd=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_zstd" >&5
+$as_echo "$with_zstd" >&6; }
+
+
+if test "$with_zstd" = yes; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
+$as_echo_n "checking for libzstd... " >&6; }
+
+if test -n "$ZSTD_CFLAGS"; then
+ pkg_cv_ZSTD_CFLAGS="$ZSTD_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+if test -n "$ZSTD_LIBS"; then
+ pkg_cv_ZSTD_LIBS="$ZSTD_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+ _pkg_short_errors_supported=yes
+else
+ _pkg_short_errors_supported=no
+fi
+ if test $_pkg_short_errors_supported = yes; then
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libzstd" 2>&1`
+ else
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libzstd" 2>&1`
+ fi
+ # Put the nasty error message in config.log where it belongs
+ echo "$ZSTD_PKG_ERRORS" >&5
+
+ as_fn_error $? "Package requirements (libzstd) were not met:
+
+$ZSTD_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+ ZSTD_CFLAGS=$pkg_cv_ZSTD_CFLAGS
+ ZSTD_LIBS=$pkg_cv_ZSTD_LIBS
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -13130,6 +13280,56 @@ fi
fi
+if test "$with_zstd" = yes ; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+/* Override any GCC internal prototype to avoid an error.
+ Use char because int might match the return type of a GCC
+ builtin and then its argument prototype would still apply. */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+ ;
+ return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+ ac_cv_lib_zstd_ZSTD_compress=yes
+else
+ ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+ LIBS="-lzstd $LIBS"
+
+else
+ as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -13904,6 +14104,77 @@ done
fi
+if test -z "$ZSTD"; then
+ for ac_prog in zstd
+do
+ # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+$as_echo_n "checking for $ac_word... " >&6; }
+if ${ac_cv_path_ZSTD+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ case $ZSTD in
+ [\\/]* | ?:[\\/]*)
+ ac_cv_path_ZSTD="$ZSTD" # Let the user override the test with a path.
+ ;;
+ *)
+ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+ IFS=$as_save_IFS
+ test -z "$as_dir" && as_dir=.
+ for ac_exec_ext in '' $ac_executable_extensions; do
+ if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
+ ac_cv_path_ZSTD="$as_dir/$ac_word$ac_exec_ext"
+ $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
+ break 2
+ fi
+done
+ done
+IFS=$as_save_IFS
+
+ ;;
+esac
+fi
+ZSTD=$ac_cv_path_ZSTD
+if test -n "$ZSTD"; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+else
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+
+ test -n "$ZSTD" && break
+done
+
+else
+ # Report the value of ZSTD in configure's output in all cases.
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD" >&5
+$as_echo_n "checking for ZSTD... " >&6; }
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+fi
+
+if test "$with_zstd" = yes; then
+ for ac_header in zstd.h
+do :
+ ac_fn_c_check_header_mongrel "$LINENO" "zstd.h" "ac_cv_header_zstd_h" "$ac_includes_default"
+if test "x$ac_cv_header_zstd_h" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_ZSTD_H 1
+_ACEOF
+
+else
+ as_fn_error $? "zstd.h header file is required for ZSTD" "$LINENO" 5
+fi
+
+done
+
+fi
+
if test "$with_gssapi" = yes ; then
for ac_header in gssapi/gssapi.h
do :
diff --git a/configure.ac b/configure.ac
index 16167329fc..729b23fbea 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1056,6 +1056,30 @@ if test "$with_lz4" = yes; then
done
fi
+#
+# ZSTD
+#
+AC_MSG_CHECKING([whether to build with ZSTD support])
+PGAC_ARG_BOOL(with, zstd, no, [build with ZSTD support],
+ [AC_DEFINE([USE_ZSTD], 1, [Define to 1 to build with ZSTD support. (--with-zstd)])])
+AC_MSG_RESULT([$with_zstd])
+AC_SUBST(with_zstd)
+
+if test "$with_zstd" = yes; then
+ PKG_CHECK_MODULES(ZSTD, libzstd)
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -1325,6 +1349,10 @@ if test "$with_lz4" = yes ; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
+if test "$with_zstd" = yes ; then
+ AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])])
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -1490,6 +1518,11 @@ if test "$with_lz4" = yes; then
AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])
fi
+PGAC_PATH_PROGS(ZSTD, zstd)
+if test "$with_zstd" = yes; then
+ AC_CHECK_HEADERS(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])])
+fi
+
if test "$with_gssapi" = yes ; then
AC_CHECK_HEADERS(gssapi/gssapi.h, [],
[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index 30dd0c7f75..d2f63db3f2 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -307,6 +307,15 @@ $ENV{MSBFLAGS}="/m";
</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><productname>ZSTD</productname></term>
+ <listitem><para>
+ Required for supporting <productname>ZSTD</productname> compression
+ method. Binaries and source can be downloaded from
+ <ulink url="https://github.com/facebook/zstd/releases"></ulink>.
+ </para></listitem>
+ </varlistentry>
+
<varlistentry>
<term><productname>OpenSSL</productname></term>
<listitem><para>
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 655095f3b1..c6190f6955 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -989,6 +989,15 @@ build-postgresql:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--with-zstd</option></term>
+ <listitem>
+ <para>
+ Build with <productname>ZSTD</productname> compression support.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--with-ssl=<replaceable>LIBRARY</replaceable></option>
<indexterm>
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 9dcd54fcbd..c980444233 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -351,6 +351,7 @@ XGETTEXT = @XGETTEXT@
GZIP = gzip
BZIP2 = bzip2
LZ4 = @LZ4@
+ZSTD = @ZSTD@
DOWNLOAD = wget -O $@ --no-use-server-timestamps
#DOWNLOAD = curl -o $@
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 28a1f0e9f0..1912cf35de 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -352,6 +352,9 @@
/* Define to 1 if you have the `z' library (-lz). */
#undef HAVE_LIBZ
+/* Define to 1 if you have the `zstd' library (-lzstd). */
+#undef HAVE_LIBZSTD
+
/* Define to 1 if you have the `link' function. */
#undef HAVE_LINK
@@ -718,6 +721,9 @@
/* Define to 1 if the assembler supports X86_64's POPCNTQ instruction. */
#undef HAVE_X86_64_POPCNTQ
+/* Define to 1 if you have the <zstd.h> header file. */
+#undef HAVE_ZSTD_H
+
/* Define to 1 if the system has the type `_Bool'. */
#undef HAVE__BOOL
@@ -949,6 +955,9 @@
/* Define to select Win32-style shared memory. */
#undef USE_WIN32_SHARED_MEMORY
+/* Define to 1 to build with ZSTD support. (--with-zstd) */
+#undef USE_ZSTD
+
/* Define to 1 if `wcstombs_l' requires <xlocale.h>. */
#undef WCSTOMBS_L_IN_XLOCALE
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index e6f20679dc..087acfbaa1 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -539,6 +539,12 @@ sub GenerateFiles
$define{HAVE_LZ4_H} = 1;
$define{USE_LZ4} = 1;
}
+ if ($self->{options}->{zstd})
+ {
+ $define{HAVE_LIBZSTD} = 1;
+ $define{HAVE_ZSTD_H} = 1;
+ $define{USE_ZSTD} = 1;
+ }
if ($self->{options}->{openssl})
{
$define{USE_OPENSSL} = 1;
@@ -1081,6 +1087,11 @@ sub AddProject
$proj->AddIncludeDir($self->{options}->{lz4} . '\include');
$proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib');
}
+ if ($self->{options}->{zstd})
+ {
+ $proj->AddIncludeDir($self->{options}->{zstd} . '\include');
+ $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib');
+ }
if ($self->{options}->{uuid})
{
$proj->AddIncludeDir($self->{options}->{uuid} . '\include');
@@ -1193,6 +1204,7 @@ sub GetFakeConfigure
$cfg .= ' --with-libxml' if ($self->{options}->{xml});
$cfg .= ' --with-libxslt' if ($self->{options}->{xslt});
$cfg .= ' --with-lz4' if ($self->{options}->{lz4});
+ $cfg .= ' --with-zstd' if ($self->{options}->{zstd});
$cfg .= ' --with-gssapi' if ($self->{options}->{gss});
$cfg .= ' --with-icu' if ($self->{options}->{icu});
$cfg .= ' --with-tcl' if ($self->{options}->{tcl});
diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index 7a9b00be72..186849a09a 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -15,6 +15,7 @@ our $config = {
gss => undef, # --with-gssapi=<path>
icu => undef, # --with-icu=<path>
lz4 => undef, # --with-lz4=<path>
+ zstd => undef, # --with-zstd=<path>
nls => undef, # --enable-nls=<path>
tap_tests => undef, # --enable-tap-tests
tcl => undef, # --with-tcl=<path>
--
2.24.3 (Apple Git-128)
v12-0004-ZSTD-add-client-side-decompression-support.patchapplication/octet-stream; name=v12-0004-ZSTD-add-client-side-decompression-support.patchDownload
From 64ad5bbfd84927ae933a282181b98bc4dd768508 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed, 16 Feb 2022 22:51:47 +0530
Subject: [PATCH 4/4] ZSTD: add client-side decompression support.
ZSTD decompression of a backup compressed on the server can be
performed on the client using pg_basebackup -Fp --compress server-lz4.
Example:
pg_basebackup -D /tmp/zstd_C_D -Fp -Xfetch --compress=server-zstd:7
---
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_zstd.c | 133 +++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +
src/bin/pg_verifybackup/t/009_extract.pl | 5 +
4 files changed, 141 insertions(+)
mode change 100644 => 100755 src/bin/pg_verifybackup/t/009_extract.pl
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index bfc624a863..02d4c05df6 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -211,6 +211,7 @@ extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
int compresslevel);
+extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 0b20267cf4..83b59d63ba 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -27,6 +27,7 @@ typedef struct bbstreamer_zstd_frame
bbstreamer base;
ZSTD_CCtx *cctx;
+ ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
} bbstreamer_zstd_frame;
@@ -42,6 +43,19 @@ const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
.finalize = bbstreamer_zstd_compressor_finalize,
.free = bbstreamer_zstd_compressor_free
};
+
+static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
+ .content = bbstreamer_zstd_decompressor_content,
+ .finalize = bbstreamer_zstd_decompressor_finalize,
+ .free = bbstreamer_zstd_decompressor_free
+};
#endif
/*
@@ -200,3 +214,122 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
pfree(streamer);
}
#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of zstd
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_decompressor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_decompressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, ZSTD_DStreamOutSize());
+
+ streamer->dctx = ZSTD_createDCtx();
+ if (!streamer->dctx)
+ {
+ pg_log_error("could not create zstd decompression context");
+ exit(1);
+ }
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Decompress the input data to output buffer until we run out of input
+ * data. Each time the output buffer is full, pass on the decompressed data
+ * to the next streamer.
+ */
+static void
+bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t ret;
+
+ /*
+ * If output buffer is full then forward the content to next streamer
+ * and update the output buffer.
+ */
+ if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ ret = ZSTD_decompressStream(mystreamer->dctx,
+ &mystreamer->zstd_outBuf, &inBuf);
+
+ if (ZSTD_isError(ret))
+ pg_log_error("could not decompress data: %s", ZSTD_getErrorName(ret));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeDCtx(mystreamer->dctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7ba752c1c9..c2cb04be1f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1332,6 +1332,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_gzip_decompressor_new(streamer);
else if (compressmethod == COMPRESSION_LZ4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
+ else if (compressmethod == COMPRESSION_ZSTD)
+ streamer = bbstreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
old mode 100644
new mode 100755
index c51cdf79f8..d30ba01742
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -31,6 +31,11 @@ my @test_configuration = (
'compression_method' => 'lz4',
'backup_flags' => ['--compress', 'server-lz4:5'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
--
2.25.1
v12-0002-ZSTD-add-server-side-compression-support.patchapplication/octet-stream; name=v12-0002-ZSTD-add-server-side-compression-support.patchDownload
From 3ca1d68003f1d1db4f6782bbe9d9825ff9e61028 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed, 16 Feb 2022 22:10:02 +0530
Subject: [PATCH 2/4] ZSTD: add server-side compression support.
This patch introduces --compress=server-zstd[:LEVEL]
Add tap test.
Add config option --with-zstd.
Add documentation for ZSTD option.
Add pg_basebackup help for ZSTD option.
Example:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-zstd:4
---
doc/src/sgml/protocol.sgml | 5 +-
doc/src/sgml/ref/pg_basebackup.sgml | 38 +--
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_zstd.c | 294 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 20 +-
src/bin/pg_basebackup/pg_receivewal.c | 4 +
src/bin/pg_basebackup/walmethods.h | 1 +
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 9 +
src/include/replication/basebackup_sink.h | 1 +
11 files changed, 358 insertions(+), 23 deletions(-)
create mode 100644 src/backend/replication/basebackup_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1c5ab00879..c13d25051c 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2725,7 +2725,7 @@ The commands accepted in replication mode are:
<para>
Instructs the server to compress the backup using the specified
method. Currently, the supported methods are <literal>gzip</literal>
- and <literal>lz4</literal>.
+ <literal>lz4</literal>, and <literal>zstd</literal>.
</para>
</listitem>
</varlistentry>
@@ -2737,7 +2737,8 @@ The commands accepted in replication mode are:
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
For <literal>gzip</literal> the value should be an integer between 1
- and 9, and for <literal>lz4</literal> it should be between 1 and 12.
+ and 9, for <literal>lz4</literal> between 1 and 12, and for
+ <literal>zstd</literal> it should be between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd1..4cf28a2a61 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,30 +417,32 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to <literal>gzip</literal> or
- <literal>lz4</literal>, or <literal>none</literal> for no
- compression. A compression level can be optionally specified, by
- appending the level number after a colon (<literal>:</literal>). If no
- level is specified, the default compression level will be used. If
- only a level is specified without mentioning an algorithm,
- <literal>gzip</literal> compression will be used if the level is
- greater than 0, and no compression will be used if the level is 0.
- </para>
- <para>
- When the tar format is used with <literal>gzip</literal> or
- <literal>lz4</literal>, the suffix <filename>.gz</filename> or
- <filename>.lz4</filename> will automatically be added to all tar
- filenames. When the plain format is used, client-side compression may
- not be specified, but it is still possible to request server-side
- compression. If this is done, the server will compress the backup for
- transmission, and the client will decompress and extract it.
+ The compression method can be set to <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> for no compression. A compression level can
+ optionally be specified, by appending the level number after a colon
+ (<literal>:</literal>). If no level is specified, the default
+ compression level will be used. If only a level is specified without
+ mentioning an algorithm, <literal>gzip</literal> compression will be
+ used if the level is greater than 0, and no compression will be used if
+ the level is 0.
+ </para>
+ <para>
+ When the tar format is used with <literal>gzip</literal>,
+ <literal>lz4</literal>, or <literal>zstd</literal>, the suffix
+ <filename>.gz</filename>, <filename>.lz4</filename>, or
+ <filename>.zst</filename> respectively will be automatically added to
+ all tar filenames. When the plain format is used, client-side
+ compression may not be specified, but it is still possible to request
+ server-side compression. If this is done, the server will compress the
+ backup for transmission, and the client will decompress and extract it.
</para>
<para>
When this option is used in combination with
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
compression is selected, but will not be compressed if server-side
- compresion or LZ4 compresion is selected.
+ compression, LZ4, or ZSTD compression is selected.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..2e6de7007f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -20,6 +20,7 @@ OBJS = \
basebackup_copy.o \
basebackup_gzip.o \
basebackup_lz4.o \
+ basebackup_zstd.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0bf28b55d7..2378ce5c5e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
{
BACKUP_COMPRESSION_NONE,
BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
} basebackup_compression_type;
typedef struct
@@ -906,6 +907,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
else if (strcmp(optval, "lz4") == 0)
opt->compression = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(optval, "zstd") == 0)
+ opt->compression = BACKUP_COMPRESSION_ZSTD;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1026,6 +1029,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
sink = bbsink_gzip_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
+ sink = bbsink_zstd_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
new file mode 100644
index 0000000000..da06b25732
--- /dev/null
+++ b/src/backend/replication/basebackup_zstd.c
@@ -0,0 +1,294 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_zstd.c
+ * Basebackup sink implementing zstd compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbsink_zstd
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level */
+ int compresslevel;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbsink_zstd;
+
+static void bbsink_zstd_begin_backup(bbsink *sink);
+static void bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_zstd_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_zstd_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_zstd_end_archive(bbsink *sink);
+static void bbsink_zstd_cleanup(bbsink *sink);
+static void bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+const bbsink_ops bbsink_zstd_ops = {
+ .begin_backup = bbsink_zstd_begin_backup,
+ .begin_archive = bbsink_zstd_begin_archive,
+ .archive_contents = bbsink_zstd_archive_contents,
+ .end_archive = bbsink_zstd_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_zstd_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_zstd_end_backup,
+ .cleanup = bbsink_zstd_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs zstd compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+ return NULL; /* keep compiler quiet */
+#else
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+
+ if (compresslevel < 0 || compresslevel > 22)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_zstd));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_zstd_begin_backup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t output_buffer_bound;
+
+ mysink->cctx = ZSTD_createCCtx();
+ if (!mysink->cctx)
+ elog(ERROR, "could not create zstd compression context");
+
+ ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Make sure that the next sink's bbs_buffer is big enough to accommodate
+ * the compressed input buffer.
+ */
+ output_buffer_bound = ZSTD_compressBound(mysink->base.bbs_buffer_length);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ char *zstd_archive_name;
+
+ /*
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they would stick
+ * around as we are resetting with option ZSTD_reset_session_only.
+ */
+ ZSTD_CCtx_reset(mysink->cctx, ZSTD_reset_session_only);
+
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ /* Add ".zst" to the archive name. */
+ zstd_archive_name = psprintf("%s.zst", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, zstd_archive_name);
+ pfree(zstd_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for the next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_zstd_end_archive() is invoked.
+ */
+static void
+bbsink_zstd_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ ZSTD_inBuffer inBuf = {mysink->base.bbs_buffer, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx, &mysink->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * There might be some data inside zstd's internal buffers; we need to get that
+ * flushed out, also end the zstd frame and then get that forwarded to the
+ * successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_zstd_end_archive(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx,
+ &mysink->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next sink. */
+ if (mysink->zstd_outBuf.pos > 0)
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Free the resources and context.
+ */
+static void
+bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+
+ bbsink_forward_end_backup(sink, endptr, endtli);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ */
+static void
+bbsink_zstd_cleanup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context if not already released. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0003b59615..7202a5eae7 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,8 +391,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
- " compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
+ printf(_(" -Z, --compress=none\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -1023,6 +1023,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1970,6 +1975,9 @@ BaseBackup(void)
case COMPRESSION_LZ4:
compressmethodstr = "lz4";
break;
+ case COMPRESSION_ZSTD:
+ compressmethodstr = "zstd";
+ break;
default:
Assert(false);
break;
@@ -2819,6 +2827,14 @@ main(int argc, char **argv)
exit(1);
}
break;
+ case COMPRESSION_ZSTD:
+ if (compresslevel > 22)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 22",
+ compresslevel, "zstd");
+ exit(1);
+ }
+ break;
}
/*
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ccb215c398..9b7656c692 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -904,6 +904,10 @@ main(int argc, char **argv)
exit(1);
#endif
break;
+ case COMPRESSION_ZSTD:
+ pg_log_error("compression with %s is not yet supported", "ZSTD");
+ exit(1);
+
}
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 2dfb353baa..ec54019cfc 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -24,6 +24,7 @@ typedef enum
{
COMPRESSION_GZIP,
COMPRESSION_LZ4,
+ COMPRESSION_ZSTD,
COMPRESSION_NONE
} WalCompressionMethod;
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 851233a6e0..596df15118 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -10,6 +10,7 @@ export TAR
# name.
export GZIP_PROGRAM=$(GZIP)
export LZ4=$(LZ4)
+export ZSTD=$(ZSTD)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index 6927ca4c74..1ccc6cb9df
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -43,6 +43,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d', '-m'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
@@ -108,6 +116,7 @@ for my $tc (@test_configuration)
# Cleanup.
unlink($backup_path . '/backup_manifest');
unlink($backup_path . '/base.tar');
+ unlink($backup_path . '/' . $tc->{'backup_archive'});
rmtree($extract_path);
}
}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a3f8d37258..a7f16758a4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
So, I went ahead and have now also implemented client side decompression
for zstd.Robert separated[1] the ZSTD configure switch from my original patch
of server side compression and also added documentation related to
the switch. I have included that patch here in the patch series for
simplicity.The server side compression patch
0002-ZSTD-add-server-side-compression-support.patch has also taken care
of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
as suggested by Álvaro Herrera.
The first hunk of the documentation changes is missing a comma between
gzip and lz4.
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they would stick
+ * around as we are resetting with option ZSTD_reset_session_only.
I don't think "would" is what you mean here. If you say something
would stick around, that means it could be that way it isn't. ("I
would go to the store and buy some apples, but I know they don't have
any so there's no point.") I think you mean "will".
- printf(_(" -Z,
--compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
- " compress tar output with given
compression method or level\n"));
+ printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
+ printf(_(" -Z, --compress=none\n"));
You deleted a line that you should have preserved here.
Overall there doesn't seem to be much to complain about here on a
first read-through. It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks for the comments Robert. I have addressed your comments in the
attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
Rest of the patches are similar to v12, but just bumped the version number.
It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.
Ok we will see, either Dipesh or I will take care of it.
Regards,
Jeevan Ladhe
On Thu, 17 Feb 2022 at 02:37, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe <jeevanladhe.os@gmail.com>
wrote:So, I went ahead and have now also implemented client side decompression
for zstd.Robert separated[1] the ZSTD configure switch from my original patch
of server side compression and also added documentation related to
the switch. I have included that patch here in the patch series for
simplicity.The server side compression patch
0002-ZSTD-add-server-side-compression-support.patch has also taken care
of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
as suggested by Álvaro Herrera.The first hunk of the documentation changes is missing a comma between
gzip and lz4.+ * At the start of each archive we reset the state to start a new + * compression operation. The parameters are sticky and they would stick + * around as we are resetting with option ZSTD_reset_session_only.I don't think "would" is what you mean here. If you say something
would stick around, that means it could be that way it isn't. ("I
would go to the store and buy some apples, but I know they don't have
any so there's no point.") I think you mean "will".- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n" - " compress tar output with given compression method or level\n")); + printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n")); + printf(_(" -Z, --compress=none\n"));You deleted a line that you should have preserved here.
Overall there doesn't seem to be much to complain about here on a
first read-through. It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v13-0004-ZSTD-add-client-side-decompression-support.patchapplication/octet-stream; name=v13-0004-ZSTD-add-client-side-decompression-support.patchDownload
From fb7c1e49afaea669f2baa7f05ed2eaf5ae003d81 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed, 16 Feb 2022 22:51:47 +0530
Subject: [PATCH 4/4] ZSTD: add client-side decompression support.
ZSTD decompression of a backup compressed on the server can be
performed on the client using pg_basebackup -Fp --compress server-lz4.
Example:
pg_basebackup -D /tmp/zstd_C_D -Fp -Xfetch --compress=server-zstd:7
---
src/bin/pg_basebackup/bbstreamer.h | 1 +
src/bin/pg_basebackup/bbstreamer_zstd.c | 133 +++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 2 +
src/bin/pg_verifybackup/t/009_extract.pl | 5 +
4 files changed, 141 insertions(+)
mode change 100644 => 100755 src/bin/pg_verifybackup/t/009_extract.pl
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index bfc624a863..02d4c05df6 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -211,6 +211,7 @@ extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
int compresslevel);
+extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 0b20267cf4..83b59d63ba 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -27,6 +27,7 @@ typedef struct bbstreamer_zstd_frame
bbstreamer base;
ZSTD_CCtx *cctx;
+ ZSTD_DCtx *dctx;
ZSTD_outBuffer zstd_outBuf;
} bbstreamer_zstd_frame;
@@ -42,6 +43,19 @@ const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
.finalize = bbstreamer_zstd_compressor_finalize,
.free = bbstreamer_zstd_compressor_free
};
+
+static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
+ .content = bbstreamer_zstd_decompressor_content,
+ .finalize = bbstreamer_zstd_decompressor_finalize,
+ .free = bbstreamer_zstd_decompressor_free
+};
#endif
/*
@@ -200,3 +214,122 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
pfree(streamer);
}
#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of zstd
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_decompressor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_decompressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, ZSTD_DStreamOutSize());
+
+ streamer->dctx = ZSTD_createDCtx();
+ if (!streamer->dctx)
+ {
+ pg_log_error("could not create zstd decompression context");
+ exit(1);
+ }
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Decompress the input data to output buffer until we run out of input
+ * data. Each time the output buffer is full, pass on the decompressed data
+ * to the next streamer.
+ */
+static void
+bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t ret;
+
+ /*
+ * If output buffer is full then forward the content to next streamer
+ * and update the output buffer.
+ */
+ if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ ret = ZSTD_decompressStream(mystreamer->dctx,
+ &mystreamer->zstd_outBuf, &inBuf);
+
+ if (ZSTD_isError(ret))
+ pg_log_error("could not decompress data: %s", ZSTD_getErrorName(ret));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeDCtx(mystreamer->dctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 18bd0df9a5..cef66d3e9e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1333,6 +1333,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_gzip_decompressor_new(streamer);
else if (compressmethod == COMPRESSION_LZ4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
+ else if (compressmethod == COMPRESSION_ZSTD)
+ streamer = bbstreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
old mode 100644
new mode 100755
index c51cdf79f8..d30ba01742
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -31,6 +31,11 @@ my @test_configuration = (
'compression_method' => 'lz4',
'backup_flags' => ['--compress', 'server-lz4:5'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
--
2.25.1
v13-0001-Add-support-for-building-with-ZSTD.patchapplication/octet-stream; name=v13-0001-Add-support-for-building-with-ZSTD.patchDownload
From cfa0448be55b7b2f9131ffec656a69f1779e0f5e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 16 Feb 2022 10:36:36 -0500
Subject: [PATCH 1/4] Add support for building with ZSTD.
This commit doesn't actually add anything that uses ZSTD; that will be
done separately. It just puts the basic infrastructure into place.
Jeevan Ladhe and Robert Haas
---
configure | 271 ++++++++++++++++++++++++++++++
configure.ac | 33 ++++
doc/src/sgml/install-windows.sgml | 9 +
doc/src/sgml/installation.sgml | 9 +
src/Makefile.global.in | 1 +
src/include/pg_config.h.in | 9 +
src/tools/msvc/Solution.pm | 12 ++
src/tools/msvc/config_default.pl | 1 +
8 files changed, 345 insertions(+)
diff --git a/configure b/configure
index 9305555658..f07f689f1a 100755
--- a/configure
+++ b/configure
@@ -650,6 +650,7 @@ CFLAGS_ARMV8_CRC32C
CFLAGS_SSE42
have_win32_dbghelp
LIBOBJS
+ZSTD
LZ4
UUID_LIBS
LDAP_LIBS_BE
@@ -700,6 +701,9 @@ with_gnu_ld
LD
LDFLAGS_SL
LDFLAGS_EX
+ZSTD_LIBS
+ZSTD_CFLAGS
+with_zstd
LZ4_LIBS
LZ4_CFLAGS
with_lz4
@@ -869,6 +873,7 @@ with_libxslt
with_system_tzdata
with_zlib
with_lz4
+with_zstd
with_gnu_ld
with_ssl
with_openssl
@@ -898,6 +903,8 @@ XML2_CFLAGS
XML2_LIBS
LZ4_CFLAGS
LZ4_LIBS
+ZSTD_CFLAGS
+ZSTD_LIBS
LDFLAGS_EX
LDFLAGS_SL
PERL
@@ -1577,6 +1584,7 @@ Optional Packages:
use system time zone data in DIR
--without-zlib do not use Zlib
--with-lz4 build with LZ4 support
+ --with-zstd build with ZSTD support
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
--with-ssl=LIB use LIB for SSL/TLS support (openssl)
--with-openssl obsolete spelling of --with-ssl=openssl
@@ -1606,6 +1614,8 @@ Some influential environment variables:
XML2_LIBS linker flags for XML2, overriding pkg-config
LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config
LZ4_LIBS linker flags for LZ4, overriding pkg-config
+ ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config
+ ZSTD_LIBS linker flags for ZSTD, overriding pkg-config
LDFLAGS_EX extra linker flags for linking executables only
LDFLAGS_SL extra linker flags for linking shared libraries only
PERL Perl program
@@ -9034,6 +9044,146 @@ fi
done
fi
+#
+# ZSTD
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with ZSTD support" >&5
+$as_echo_n "checking whether to build with ZSTD support... " >&6; }
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+ withval=$with_zstd;
+ case $withval in
+ yes)
+
+$as_echo "#define USE_ZSTD 1" >>confdefs.h
+
+ ;;
+ no)
+ :
+ ;;
+ *)
+ as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+ ;;
+ esac
+
+else
+ with_zstd=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_zstd" >&5
+$as_echo "$with_zstd" >&6; }
+
+
+if test "$with_zstd" = yes; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
+$as_echo_n "checking for libzstd... " >&6; }
+
+if test -n "$ZSTD_CFLAGS"; then
+ pkg_cv_ZSTD_CFLAGS="$ZSTD_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+if test -n "$ZSTD_LIBS"; then
+ pkg_cv_ZSTD_LIBS="$ZSTD_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+ if test -n "$PKG_CONFIG" && \
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libzstd\""; } >&5
+ ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+ ac_status=$?
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+ test $ac_status = 0; }; then
+ pkg_cv_ZSTD_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+ pkg_failed=yes
+fi
+ else
+ pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+ _pkg_short_errors_supported=yes
+else
+ _pkg_short_errors_supported=no
+fi
+ if test $_pkg_short_errors_supported = yes; then
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libzstd" 2>&1`
+ else
+ ZSTD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libzstd" 2>&1`
+ fi
+ # Put the nasty error message in config.log where it belongs
+ echo "$ZSTD_PKG_ERRORS" >&5
+
+ as_fn_error $? "Package requirements (libzstd) were not met:
+
+$ZSTD_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables ZSTD_CFLAGS
+and ZSTD_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+ ZSTD_CFLAGS=$pkg_cv_ZSTD_CFLAGS
+ ZSTD_LIBS=$pkg_cv_ZSTD_LIBS
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -13130,6 +13280,56 @@ fi
fi
+if test "$with_zstd" = yes ; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+/* Override any GCC internal prototype to avoid an error.
+ Use char because int might match the return type of a GCC
+ builtin and then its argument prototype would still apply. */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+ ;
+ return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+ ac_cv_lib_zstd_ZSTD_compress=yes
+else
+ ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+ LIBS="-lzstd $LIBS"
+
+else
+ as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -13904,6 +14104,77 @@ done
fi
+if test -z "$ZSTD"; then
+ for ac_prog in zstd
+do
+ # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+$as_echo_n "checking for $ac_word... " >&6; }
+if ${ac_cv_path_ZSTD+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ case $ZSTD in
+ [\\/]* | ?:[\\/]*)
+ ac_cv_path_ZSTD="$ZSTD" # Let the user override the test with a path.
+ ;;
+ *)
+ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+ IFS=$as_save_IFS
+ test -z "$as_dir" && as_dir=.
+ for ac_exec_ext in '' $ac_executable_extensions; do
+ if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
+ ac_cv_path_ZSTD="$as_dir/$ac_word$ac_exec_ext"
+ $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
+ break 2
+ fi
+done
+ done
+IFS=$as_save_IFS
+
+ ;;
+esac
+fi
+ZSTD=$ac_cv_path_ZSTD
+if test -n "$ZSTD"; then
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+else
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+
+ test -n "$ZSTD" && break
+done
+
+else
+ # Report the value of ZSTD in configure's output in all cases.
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD" >&5
+$as_echo_n "checking for ZSTD... " >&6; }
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ZSTD" >&5
+$as_echo "$ZSTD" >&6; }
+fi
+
+if test "$with_zstd" = yes; then
+ for ac_header in zstd.h
+do :
+ ac_fn_c_check_header_mongrel "$LINENO" "zstd.h" "ac_cv_header_zstd_h" "$ac_includes_default"
+if test "x$ac_cv_header_zstd_h" = xyes; then :
+ cat >>confdefs.h <<_ACEOF
+#define HAVE_ZSTD_H 1
+_ACEOF
+
+else
+ as_fn_error $? "zstd.h header file is required for ZSTD" "$LINENO" 5
+fi
+
+done
+
+fi
+
if test "$with_gssapi" = yes ; then
for ac_header in gssapi/gssapi.h
do :
diff --git a/configure.ac b/configure.ac
index 16167329fc..729b23fbea 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1056,6 +1056,30 @@ if test "$with_lz4" = yes; then
done
fi
+#
+# ZSTD
+#
+AC_MSG_CHECKING([whether to build with ZSTD support])
+PGAC_ARG_BOOL(with, zstd, no, [build with ZSTD support],
+ [AC_DEFINE([USE_ZSTD], 1, [Define to 1 to build with ZSTD support. (--with-zstd)])])
+AC_MSG_RESULT([$with_zstd])
+AC_SUBST(with_zstd)
+
+if test "$with_zstd" = yes; then
+ PKG_CHECK_MODULES(ZSTD, libzstd)
+ # We only care about -I, -D, and -L switches;
+ # note that -lzstd will be added by AC_CHECK_LIB below.
+ for pgac_option in $ZSTD_CFLAGS; do
+ case $pgac_option in
+ -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+ esac
+ done
+ for pgac_option in $ZSTD_LIBS; do
+ case $pgac_option in
+ -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+ esac
+ done
+fi
#
# Assignments
#
@@ -1325,6 +1349,10 @@ if test "$with_lz4" = yes ; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
+if test "$with_zstd" = yes ; then
+ AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])])
+fi
+
# Note: We can test for libldap_r only after we know PTHREAD_LIBS;
# also, on AIX, we may need to have openssl in LIBS for this step.
if test "$with_ldap" = yes ; then
@@ -1490,6 +1518,11 @@ if test "$with_lz4" = yes; then
AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])
fi
+PGAC_PATH_PROGS(ZSTD, zstd)
+if test "$with_zstd" = yes; then
+ AC_CHECK_HEADERS(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])])
+fi
+
if test "$with_gssapi" = yes ; then
AC_CHECK_HEADERS(gssapi/gssapi.h, [],
[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index 30dd0c7f75..d2f63db3f2 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -307,6 +307,15 @@ $ENV{MSBFLAGS}="/m";
</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><productname>ZSTD</productname></term>
+ <listitem><para>
+ Required for supporting <productname>ZSTD</productname> compression
+ method. Binaries and source can be downloaded from
+ <ulink url="https://github.com/facebook/zstd/releases"></ulink>.
+ </para></listitem>
+ </varlistentry>
+
<varlistentry>
<term><productname>OpenSSL</productname></term>
<listitem><para>
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 655095f3b1..c6190f6955 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -989,6 +989,15 @@ build-postgresql:
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--with-zstd</option></term>
+ <listitem>
+ <para>
+ Build with <productname>ZSTD</productname> compression support.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--with-ssl=<replaceable>LIBRARY</replaceable></option>
<indexterm>
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 9dcd54fcbd..c980444233 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -351,6 +351,7 @@ XGETTEXT = @XGETTEXT@
GZIP = gzip
BZIP2 = bzip2
LZ4 = @LZ4@
+ZSTD = @ZSTD@
DOWNLOAD = wget -O $@ --no-use-server-timestamps
#DOWNLOAD = curl -o $@
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 28a1f0e9f0..1912cf35de 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -352,6 +352,9 @@
/* Define to 1 if you have the `z' library (-lz). */
#undef HAVE_LIBZ
+/* Define to 1 if you have the `zstd' library (-lzstd). */
+#undef HAVE_LIBZSTD
+
/* Define to 1 if you have the `link' function. */
#undef HAVE_LINK
@@ -718,6 +721,9 @@
/* Define to 1 if the assembler supports X86_64's POPCNTQ instruction. */
#undef HAVE_X86_64_POPCNTQ
+/* Define to 1 if you have the <zstd.h> header file. */
+#undef HAVE_ZSTD_H
+
/* Define to 1 if the system has the type `_Bool'. */
#undef HAVE__BOOL
@@ -949,6 +955,9 @@
/* Define to select Win32-style shared memory. */
#undef USE_WIN32_SHARED_MEMORY
+/* Define to 1 to build with ZSTD support. (--with-zstd) */
+#undef USE_ZSTD
+
/* Define to 1 if `wcstombs_l' requires <xlocale.h>. */
#undef WCSTOMBS_L_IN_XLOCALE
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index e6f20679dc..087acfbaa1 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -539,6 +539,12 @@ sub GenerateFiles
$define{HAVE_LZ4_H} = 1;
$define{USE_LZ4} = 1;
}
+ if ($self->{options}->{zstd})
+ {
+ $define{HAVE_LIBZSTD} = 1;
+ $define{HAVE_ZSTD_H} = 1;
+ $define{USE_ZSTD} = 1;
+ }
if ($self->{options}->{openssl})
{
$define{USE_OPENSSL} = 1;
@@ -1081,6 +1087,11 @@ sub AddProject
$proj->AddIncludeDir($self->{options}->{lz4} . '\include');
$proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib');
}
+ if ($self->{options}->{zstd})
+ {
+ $proj->AddIncludeDir($self->{options}->{zstd} . '\include');
+ $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib');
+ }
if ($self->{options}->{uuid})
{
$proj->AddIncludeDir($self->{options}->{uuid} . '\include');
@@ -1193,6 +1204,7 @@ sub GetFakeConfigure
$cfg .= ' --with-libxml' if ($self->{options}->{xml});
$cfg .= ' --with-libxslt' if ($self->{options}->{xslt});
$cfg .= ' --with-lz4' if ($self->{options}->{lz4});
+ $cfg .= ' --with-zstd' if ($self->{options}->{zstd});
$cfg .= ' --with-gssapi' if ($self->{options}->{gss});
$cfg .= ' --with-icu' if ($self->{options}->{icu});
$cfg .= ' --with-tcl' if ($self->{options}->{tcl});
diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index 7a9b00be72..186849a09a 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -15,6 +15,7 @@ our $config = {
gss => undef, # --with-gssapi=<path>
icu => undef, # --with-icu=<path>
lz4 => undef, # --with-lz4=<path>
+ zstd => undef, # --with-zstd=<path>
nls => undef, # --enable-nls=<path>
tap_tests => undef, # --enable-tap-tests
tcl => undef, # --with-tcl=<path>
--
2.25.1
v13-0002-ZSTD-add-server-side-compression-support.patchapplication/octet-stream; name=v13-0002-ZSTD-add-server-side-compression-support.patchDownload
From 0f0770583989fc55afa8003046ea9b85af2142b3 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Thu, 17 Feb 2022 06:55:53 +0530
Subject: [PATCH 2/4] ZSTD: add server-side compression support.
This patch introduces --compress=server-zstd[:LEVEL]
Add tap test.
Add config option --with-zstd.
Add documentation for ZSTD option.
Add pg_basebackup help for ZSTD option.
Example:
pg_basebackup -t server:/tmp/data_test -Xnone --compress=server-zstd:4
---
doc/src/sgml/protocol.sgml | 7 +-
doc/src/sgml/ref/pg_basebackup.sgml | 38 +--
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_zstd.c | 294 ++++++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 19 +-
src/bin/pg_basebackup/pg_receivewal.c | 4 +
src/bin/pg_basebackup/walmethods.h | 1 +
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 9 +
src/include/replication/basebackup_sink.h | 1 +
11 files changed, 359 insertions(+), 23 deletions(-)
create mode 100644 src/backend/replication/basebackup_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/008_untar.pl
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1c5ab00879..8fe638767d 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2724,8 +2724,8 @@ The commands accepted in replication mode are:
<listitem>
<para>
Instructs the server to compress the backup using the specified
- method. Currently, the supported methods are <literal>gzip</literal>
- and <literal>lz4</literal>.
+ method. Currently, the supported methods are <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal>.
</para>
</listitem>
</varlistentry>
@@ -2737,7 +2737,8 @@ The commands accepted in replication mode are:
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
For <literal>gzip</literal> the value should be an integer between 1
- and 9, and for <literal>lz4</literal> it should be between 1 and 12.
+ and 9, for <literal>lz4</literal> between 1 and 12, and for
+ <literal>zstd</literal> it should be between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd1..4cf28a2a61 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,30 +417,32 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to <literal>gzip</literal> or
- <literal>lz4</literal>, or <literal>none</literal> for no
- compression. A compression level can be optionally specified, by
- appending the level number after a colon (<literal>:</literal>). If no
- level is specified, the default compression level will be used. If
- only a level is specified without mentioning an algorithm,
- <literal>gzip</literal> compression will be used if the level is
- greater than 0, and no compression will be used if the level is 0.
- </para>
- <para>
- When the tar format is used with <literal>gzip</literal> or
- <literal>lz4</literal>, the suffix <filename>.gz</filename> or
- <filename>.lz4</filename> will automatically be added to all tar
- filenames. When the plain format is used, client-side compression may
- not be specified, but it is still possible to request server-side
- compression. If this is done, the server will compress the backup for
- transmission, and the client will decompress and extract it.
+ The compression method can be set to <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> for no compression. A compression level can
+ optionally be specified, by appending the level number after a colon
+ (<literal>:</literal>). If no level is specified, the default
+ compression level will be used. If only a level is specified without
+ mentioning an algorithm, <literal>gzip</literal> compression will be
+ used if the level is greater than 0, and no compression will be used if
+ the level is 0.
+ </para>
+ <para>
+ When the tar format is used with <literal>gzip</literal>,
+ <literal>lz4</literal>, or <literal>zstd</literal>, the suffix
+ <filename>.gz</filename>, <filename>.lz4</filename>, or
+ <filename>.zst</filename> respectively will be automatically added to
+ all tar filenames. When the plain format is used, client-side
+ compression may not be specified, but it is still possible to request
+ server-side compression. If this is done, the server will compress the
+ backup for transmission, and the client will decompress and extract it.
</para>
<para>
When this option is used in combination with
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
compression is selected, but will not be compressed if server-side
- compresion or LZ4 compresion is selected.
+ compression, LZ4, or ZSTD compression is selected.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..2e6de7007f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -20,6 +20,7 @@ OBJS = \
basebackup_copy.o \
basebackup_gzip.o \
basebackup_lz4.o \
+ basebackup_zstd.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0bf28b55d7..2378ce5c5e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
{
BACKUP_COMPRESSION_NONE,
BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
} basebackup_compression_type;
typedef struct
@@ -906,6 +907,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
else if (strcmp(optval, "lz4") == 0)
opt->compression = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(optval, "zstd") == 0)
+ opt->compression = BACKUP_COMPRESSION_ZSTD;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1026,6 +1029,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
sink = bbsink_gzip_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
+ sink = bbsink_zstd_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
new file mode 100644
index 0000000000..24993a5bb6
--- /dev/null
+++ b/src/backend/replication/basebackup_zstd.c
@@ -0,0 +1,294 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_zstd.c
+ * Basebackup sink implementing zstd compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbsink_zstd
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level */
+ int compresslevel;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbsink_zstd;
+
+static void bbsink_zstd_begin_backup(bbsink *sink);
+static void bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_zstd_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_zstd_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_zstd_end_archive(bbsink *sink);
+static void bbsink_zstd_cleanup(bbsink *sink);
+static void bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+const bbsink_ops bbsink_zstd_ops = {
+ .begin_backup = bbsink_zstd_begin_backup,
+ .begin_archive = bbsink_zstd_begin_archive,
+ .archive_contents = bbsink_zstd_archive_contents,
+ .end_archive = bbsink_zstd_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_zstd_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_zstd_end_backup,
+ .cleanup = bbsink_zstd_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs zstd compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+ return NULL; /* keep compiler quiet */
+#else
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+
+ if (compresslevel < 0 || compresslevel > 22)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_zstd));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_zstd_begin_backup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t output_buffer_bound;
+
+ mysink->cctx = ZSTD_createCCtx();
+ if (!mysink->cctx)
+ elog(ERROR, "could not create zstd compression context");
+
+ ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Make sure that the next sink's bbs_buffer is big enough to accommodate
+ * the compressed input buffer.
+ */
+ output_buffer_bound = ZSTD_compressBound(mysink->base.bbs_buffer_length);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ char *zstd_archive_name;
+
+ /*
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they will stick
+ * around as we are resetting with option ZSTD_reset_session_only.
+ */
+ ZSTD_CCtx_reset(mysink->cctx, ZSTD_reset_session_only);
+
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ /* Add ".zst" to the archive name. */
+ zstd_archive_name = psprintf("%s.zst", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, zstd_archive_name);
+ pfree(zstd_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for the next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_zstd_end_archive() is invoked.
+ */
+static void
+bbsink_zstd_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ ZSTD_inBuffer inBuf = {mysink->base.bbs_buffer, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx, &mysink->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * There might be some data inside zstd's internal buffers; we need to get that
+ * flushed out, also end the zstd frame and then get that forwarded to the
+ * successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_zstd_end_archive(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx,
+ &mysink->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next sink. */
+ if (mysink->zstd_outBuf.pos > 0)
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Free the resources and context.
+ */
+static void
+bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+
+ bbsink_forward_end_backup(sink, endptr, endtli);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ */
+static void
+bbsink_zstd_cleanup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context if not already released. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0003b59615..304d510220 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,8 +391,9 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"
" compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=none do not compress tar output\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -1023,6 +1024,11 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1970,6 +1976,9 @@ BaseBackup(void)
case COMPRESSION_LZ4:
compressmethodstr = "lz4";
break;
+ case COMPRESSION_ZSTD:
+ compressmethodstr = "zstd";
+ break;
default:
Assert(false);
break;
@@ -2819,6 +2828,14 @@ main(int argc, char **argv)
exit(1);
}
break;
+ case COMPRESSION_ZSTD:
+ if (compresslevel > 22)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 22",
+ compresslevel, "zstd");
+ exit(1);
+ }
+ break;
}
/*
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ccb215c398..9b7656c692 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -904,6 +904,10 @@ main(int argc, char **argv)
exit(1);
#endif
break;
+ case COMPRESSION_ZSTD:
+ pg_log_error("compression with %s is not yet supported", "ZSTD");
+ exit(1);
+
}
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 2dfb353baa..ec54019cfc 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -24,6 +24,7 @@ typedef enum
{
COMPRESSION_GZIP,
COMPRESSION_LZ4,
+ COMPRESSION_ZSTD,
COMPRESSION_NONE
} WalCompressionMethod;
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 851233a6e0..596df15118 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -10,6 +10,7 @@ export TAR
# name.
export GZIP_PROGRAM=$(GZIP)
export LZ4=$(LZ4)
+export ZSTD=$(ZSTD)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
old mode 100644
new mode 100755
index 6927ca4c74..1ccc6cb9df
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -43,6 +43,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d', '-m'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
@@ -108,6 +116,7 @@ for my $tc (@test_configuration)
# Cleanup.
unlink($backup_path . '/backup_manifest');
unlink($backup_path . '/base.tar');
+ unlink($backup_path . '/' . $tc->{'backup_archive'});
rmtree($extract_path);
}
}
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a3f8d37258..a7f16758a4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
2.25.1
v13-0003-ZSTD-add-client-side-compression-support.patchapplication/octet-stream; name=v13-0003-ZSTD-add-client-side-compression-support.patchDownload
From 60408e8186d35f979071dc69ee556cc15c7f85b0 Mon Sep 17 00:00:00 2001
From: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
Date: Wed, 16 Feb 2022 22:22:27 +0530
Subject: [PATCH 3/4] ZSTD: add client-side compression support.
ZSTD compression can now be performed on the client using
pg_basebackup -Ft --compress client-zstd[:LEVEL].
Example:
pg_basebackup -D /tmp/zstd_client -Ft -Xnone --compress=client-zstd
---
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 2 +
src/bin/pg_basebackup/bbstreamer_zstd.c | 202 ++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 28 ++-
src/bin/pg_verifybackup/t/010_client_untar.pl | 8 +
src/tools/msvc/Mkvcbuild.pm | 1 +
6 files changed, 240 insertions(+), 2 deletions(-)
create mode 100644 src/bin/pg_basebackup/bbstreamer_zstd.c
mode change 100644 => 100755 src/bin/pg_verifybackup/t/010_client_untar.pl
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 1d0db4f9d0..0035ebcef5 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -44,6 +44,7 @@ BBOBJS = \
bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_lz4.o \
+ bbstreamer_zstd.o \
bbstreamer_tar.o
all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index c2de77bacc..bfc624a863 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -209,6 +209,8 @@ extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
int compresslevel);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
+ int compresslevel);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
new file mode 100644
index 0000000000..0b20267cf4
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -0,0 +1,202 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_zstd.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_zstd.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbstreamer_zstd_frame
+{
+ bbstreamer base;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbstreamer_zstd_frame;
+
+static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
+ .content = bbstreamer_zstd_compressor_content,
+ .finalize = bbstreamer_zstd_compressor_finalize,
+ .free = bbstreamer_zstd_compressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs zstd compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_compressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, ZSTD_DStreamOutSize());
+
+ streamer->cctx = ZSTD_createCCtx();
+ if (!streamer->cctx)
+ pg_log_error("could not create zstd compression context");
+
+ /* Initialize stream compression preferences */
+ ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compresslevel);
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support zstd compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Compress the input data to output buffer.
+ *
+ * Find out the compression bound based on input data length for each
+ * invocation to make sure that output buffer has enough capacity to
+ * accommodate the compressed data. In case if the output buffer
+ * capacity falls short of compression bound then forward the content
+ * of output buffer to next streamer and empty the buffer.
+ */
+static void
+bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the output buffer is not left with enough space, send the
+ * compressed bytes to the next streamer, and empty the buffer.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx, &mystreamer->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the output buffer is not left with enough space, send the
+ * compressed bytes to the next streamer, and empty the buffer.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx,
+ &mystreamer->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next streamer. */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeCCtx(mystreamer->cctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 304d510220..18bd0df9a5 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1024,6 +1024,16 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ }
+ else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
{
*methodres = COMPRESSION_ZSTD;
@@ -1147,7 +1157,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool inject_manifest;
bool is_tar,
is_tar_gz,
- is_tar_lz4;
+ is_tar_lz4,
+ is_tar_zstd;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1170,6 +1181,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_lz4 = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+ /* Is this a ZSTD archive? */
+ is_tar_zstd = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1179,7 +1194,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4)
+ if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4
+ && !is_tar_zstd)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1251,6 +1267,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_lz4_compressor_new(streamer,
compresslevel);
}
+ else if (compressmethod == COMPRESSION_ZSTD)
+ {
+ strlcat(archive_filename, ".zst", sizeof(archive_filename));
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
+ streamer = bbstreamer_zstd_compressor_new(streamer,
+ compresslevel);
+ }
else
{
Assert(false); /* not reachable */
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
old mode 100644
new mode 100755
index 3616529390..c2a6161be6
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -42,6 +42,14 @@ my @test_configuration = (
'decompress_flags' => [ '-d' ],
'output_file' => 'base.tar',
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:5'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index bab81bd459..901e755d01 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -380,6 +380,7 @@ sub mkvcbuild
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_gzip.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_inject.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_lz4.c');
+ $pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_zstd.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_tar.c');
$pgbasebackup->AddLibrary('ws2_32.lib');
--
2.25.1
Hi,
It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.Ok we will see, either Dipesh or I will take care of it.
I took a look at the CreateWalTarMethod to support LZ4 compression
for WAL files. The current implementation involves a 3 step to backup
a WAL file to a tar archive. For each file:
1. It first writes the header in the function tar_open_for_write,
flushes the contents of tar to disk and stores the header offset.
2. Next, the contents of WAL are written to the tar archive.
3. In the end, it recalculates the checksum in function tar_close() and
overwrites the header at an offset stored in step #1.
The need for overwriting header in CreateWalTarMethod is mainly related to
partial WAL files where the size of the WAL file < WalSegSize. The file is
being
padded and checksum is recalculated after adding pad bytes.
If we go ahead and implement LZ4 support for CreateWalTarMethod then
we have a problem here at step #3. In order to achieve better compression
ratio, compressed LZ4 blocks are linked to each other and these blocks
are decoded sequentially. If we overwrite the header as part of step #3 then
it corrupts the link between compressed LZ4 blocks. Although LZ4 provides
an option to write the compressed block independently (using blockMode
option set to LZ4F_blockIndepedent) but it is still a problem because we
don't
know if overwriting the header after recalculating the checksum will not
overlap
the boundary of the next block.
GZIP manages to overcome this problem as it provides an option to turn
on/off
compression on the fly while writing a compressed archive with the help of
zlib
library function deflateParams(). The current gzip implementation for
CreateWalTarMethod uses this library function to turn off compression just
before
step #1 and it writes the uncompressed header of size equal to
TAR_BLOCK_SIZE.
It uses the same library function to turn on the compression for writing
the contents
of the WAL file as part of step #2. It again turns off the compression just
before step
#3 to overwrite the header. The header is overwritten at the same offset
with size
equal to TAR_BLOCK_SIZE.
Since GZIP provides this option to enable/disable compression, it is
possible to
control the size of data we are writing to a compressed archive. Even if we
overwrite
an already written block in a compressed archive there is no risk of it
overlapping
with the boundary of the next block. This mechanism is not available in LZ4
and ZSTD.
In order to support LZ4 and ZSTD compression for CreateWalTarMethod we may
need to refactor this code unless I am missing something. We need to
somehow
add the padding bytes in case of partial WAL before we send it to the
compressed
archive. This will make sure that all files which are being compressed does
not
require any padding as the size is always equal to WalSegSize. There is no
need to
recalculate the checksum and we can avoid overwriting the header as part of
step #3.
Thoughts?
Thanks,
Dipesh
On Fri, Mar 4, 2022 at 3:32 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
GZIP manages to overcome this problem as it provides an option to turn on/off
compression on the fly while writing a compressed archive with the help of zlib
library function deflateParams(). The current gzip implementation for
CreateWalTarMethod uses this library function to turn off compression just before
step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE.
It uses the same library function to turn on the compression for writing the contents
of the WAL file as part of step #2. It again turns off the compression just before step
#3 to overwrite the header. The header is overwritten at the same offset with size
equal to TAR_BLOCK_SIZE.
This is a real mess. To me, it seems like a pretty big hack to use
deflateParams() to shut off compression in the middle of the
compressed data stream so that we can go back and overwrite that part
of the data later. It appears that the only reason we need that hack
is because we don't know the file size starting out. Except we kind of
do know the size, because pad_to_size specifies a minimum size for the
file. It's true that the maximum file size is unbounded, but I'm not
sure why that's important. I wonder if anyone else has an idea why we
didn't just set the file size to pad_to_size exactly when we write the
tar header the first time, instead of this IMHO kind of nutty approach
where we back up. I'd try to figure it out from the comments, but
there basically aren't any. I also had a look at the relevant commit
messages and didn't see anything relevant there either. If I'm missing
something, please point it out.
While I'm complaining, I noticed while looking at this code that it is
documented that "The caller must ensure that only one method is
instantiated in any given program, and that it's only instantiated
once!" As far as I can see, this is because somebody thought about
putting all of the relevant data into a struct and then decided on an
alternative strategy of storing some of it there, and the rest in a
global variable. I can't quite imagine why anyone would think that was
a good idea. There may be some reason that I can't see right now, but
here again there appear to be no relevant code comments.
I'm somewhat inclined to wonder whether we could just get rid of
walmethods.c entirely and use the new bbstreamer stuff instead. That
code also knows how to write plain files into a directory, and write
tar archives, and compress stuff, but in my totally biased opinion as
the author of most of that code, it's better code. It has no
restriction on using at most one method per program, or of
instantiating that method only once, and it already has LZ4 support,
and there's a pending patch for ZSTD support that I intend to get
committed soon as well. It also has, and I know I might be beating a
dead horse here, comments. Now, admittedly, it does need to know the
size of each archive member up front in order to work, so if we can't
solve the problem then we can't go this route. But if we can't solve
that problem, then we also can't add LZ4 and ZSTD support to
walmethods.c, because random access to compressed data is not really a
thing, even if we hacked it to work for gzip.
Thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Feb 16, 2022 at 8:46 PM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
Thanks for the comments Robert. I have addressed your comments in the
attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
Rest of the patches are similar to v12, but just bumped the version number.
OK, here's a consolidated patch with all your changes from 0002-0004
as 0001 plus a few proposed edits of my own in 0002. By and large I
think this is fine.
My proposed changes are largely cosmetic, but one thing that isn't is
revising the size - pos <= bound tests to instead check size - pos <
bound. My reasoning for that change is: if the number of bytes
remaining in the buffer is exactly equal to the maximum number we can
write, we don't need to flush it yet. If that sounds correct, we
should fix the LZ4 code the same way.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v14-0002-My-changes.patchapplication/octet-stream; name=v14-0002-My-changes.patchDownload
From 76a910744597ab95cabbbfc68872832f18289aa1 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 7 Mar 2022 16:20:32 -0500
Subject: [PATCH v14 2/2] My changes.
---
doc/src/sgml/ref/pg_basebackup.sgml | 7 ++---
src/backend/replication/basebackup_zstd.c | 33 +++++++++++++----------
src/bin/pg_basebackup/bbstreamer_zstd.c | 23 +++++++++-------
3 files changed, 36 insertions(+), 27 deletions(-)
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 4cf28a2a61..4a630b59b7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -431,7 +431,7 @@ PostgreSQL documentation
When the tar format is used with <literal>gzip</literal>,
<literal>lz4</literal>, or <literal>zstd</literal>, the suffix
<filename>.gz</filename>, <filename>.lz4</filename>, or
- <filename>.zst</filename> respectively will be automatically added to
+ <filename>.zst</filename>, respectively, will be automatically added to
all tar filenames. When the plain format is used, client-side
compression may not be specified, but it is still possible to request
server-side compression. If this is done, the server will compress the
@@ -441,8 +441,9 @@ PostgreSQL documentation
When this option is used in combination with
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
- compression is selected, but will not be compressed if server-side
- compression, LZ4, or ZSTD compression is selected.
+ compression is selected, but will not be compressed if any other
+ compression algorithm is selected, or if server-side compression
+ is selected.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 24993a5bb6..e3f9b1d4dc 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -172,18 +172,19 @@ bbsink_zstd_archive_contents(bbsink *sink, size_t len)
while (inBuf.pos < inBuf.size)
{
size_t yet_to_flush;
- size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+ size_t max_needed = ZSTD_compressBound(inBuf.size - inBuf.pos);
/*
* If the out buffer is not left with enough space, send the output
* buffer to the next sink, and reset it.
*/
- if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
- required_outBuf_bound)
+ if (mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos < max_needed)
{
- bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ bbsink_archive_contents(mysink->base.bbs_next,
+ mysink->zstd_outBuf.pos);
mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
- mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.size =
+ mysink->base.bbs_next->bbs_buffer_length;
mysink->zstd_outBuf.pos = 0;
}
@@ -191,7 +192,9 @@ bbsink_zstd_archive_contents(bbsink *sink, size_t len)
&inBuf, ZSTD_e_continue);
if (ZSTD_isError(yet_to_flush))
- elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ elog(ERROR,
+ "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
}
}
@@ -211,18 +214,19 @@ bbsink_zstd_end_archive(bbsink *sink)
do
{
ZSTD_inBuffer in = {NULL, 0, 0};
- size_t required_outBuf_bound = ZSTD_compressBound(0);
+ size_t max_needed = ZSTD_compressBound(0);
/*
* If the out buffer is not left with enough space, send the output
* buffer to the next sink, and reset it.
*/
- if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
- required_outBuf_bound)
+ if (mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos < max_needed)
{
- bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ bbsink_archive_contents(mysink->base.bbs_next,
+ mysink->zstd_outBuf.pos);
mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
- mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.size =
+ mysink->base.bbs_next->bbs_buffer_length;
mysink->zstd_outBuf.pos = 0;
}
@@ -238,7 +242,8 @@ bbsink_zstd_end_archive(bbsink *sink)
/* Make sure to pass any remaining bytes to the next sink. */
if (mysink->zstd_outBuf.pos > 0)
- bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ bbsink_archive_contents(mysink->base.bbs_next,
+ mysink->zstd_outBuf.pos);
/* Pass on the information that this archive has ended. */
bbsink_forward_end_archive(sink);
@@ -275,8 +280,8 @@ bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
}
/*
- * In case the backup fails, make sure we free the compression context by
- * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ * In case the backup fails, make sure we free any compression context that
+ * got allocated, so that we don't leak memory.
*/
static void
bbsink_zstd_cleanup(bbsink *sink)
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 83b59d63ba..cc68367dd5 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -121,14 +121,14 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
while (inBuf.pos < inBuf.size)
{
size_t yet_to_flush;
- size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+ size_t max_needed = ZSTD_compressBound(inBuf.size - inBuf.pos);
/*
* If the output buffer is not left with enough space, send the
* compressed bytes to the next streamer, and empty the buffer.
*/
- if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
- required_outBuf_bound)
+ if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
+ max_needed)
{
bbstreamer_content(mystreamer->base.bbs_next, member,
mystreamer->zstd_outBuf.dst,
@@ -141,11 +141,13 @@ bbstreamer_zstd_compressor_content(bbstreamer *streamer,
mystreamer->zstd_outBuf.pos = 0;
}
- yet_to_flush = ZSTD_compressStream2(mystreamer->cctx, &mystreamer->zstd_outBuf,
- &inBuf, ZSTD_e_continue);
+ yet_to_flush =
+ ZSTD_compressStream2(mystreamer->cctx, &mystreamer->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
if (ZSTD_isError(yet_to_flush))
- pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ pg_log_error("could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
}
}
@@ -161,14 +163,14 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
do
{
ZSTD_inBuffer in = {NULL, 0, 0};
- size_t required_outBuf_bound = ZSTD_compressBound(0);
+ size_t max_needed = ZSTD_compressBound(0);
/*
* If the output buffer is not left with enough space, send the
* compressed bytes to the next streamer, and empty the buffer.
*/
- if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
- required_outBuf_bound)
+ if (mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos <
+ max_needed)
{
bbstreamer_content(mystreamer->base.bbs_next, NULL,
mystreamer->zstd_outBuf.dst,
@@ -186,7 +188,8 @@ bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
&in, ZSTD_e_end);
if (ZSTD_isError(yet_to_flush))
- pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ pg_log_error("could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
} while (yet_to_flush > 0);
--
2.24.3 (Apple Git-128)
v14-0001-Patches-from-JL.patchapplication/octet-stream; name=v14-0001-Patches-from-JL.patchDownload
From b22f46c645a999ec7bab702bb418a6c5e6bf234e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 7 Mar 2022 15:08:45 -0500
Subject: [PATCH v14 1/2] Patches from JL.
---
doc/src/sgml/protocol.sgml | 7 +-
doc/src/sgml/ref/pg_basebackup.sgml | 38 +-
src/backend/replication/Makefile | 1 +
src/backend/replication/basebackup.c | 7 +-
src/backend/replication/basebackup_zstd.c | 294 +++++++++++++++
src/bin/pg_basebackup/Makefile | 1 +
src/bin/pg_basebackup/bbstreamer.h | 3 +
src/bin/pg_basebackup/bbstreamer_zstd.c | 335 ++++++++++++++++++
src/bin/pg_basebackup/pg_basebackup.c | 49 ++-
src/bin/pg_basebackup/pg_receivewal.c | 4 +
src/bin/pg_basebackup/walmethods.h | 1 +
src/bin/pg_verifybackup/Makefile | 1 +
src/bin/pg_verifybackup/t/008_untar.pl | 9 +
src/bin/pg_verifybackup/t/009_extract.pl | 5 +
src/bin/pg_verifybackup/t/010_client_untar.pl | 8 +
src/include/replication/basebackup_sink.h | 1 +
src/tools/msvc/Mkvcbuild.pm | 1 +
17 files changed, 740 insertions(+), 25 deletions(-)
create mode 100644 src/backend/replication/basebackup_zstd.c
create mode 100644 src/bin/pg_basebackup/bbstreamer_zstd.c
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index c51c4254a7..0695bcd423 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2724,8 +2724,8 @@ The commands accepted in replication mode are:
<listitem>
<para>
Instructs the server to compress the backup using the specified
- method. Currently, the supported methods are <literal>gzip</literal>
- and <literal>lz4</literal>.
+ method. Currently, the supported methods are <literal>gzip</literal>,
+ <literal>lz4</literal>, and <literal>zstd</literal>.
</para>
</listitem>
</varlistentry>
@@ -2737,7 +2737,8 @@ The commands accepted in replication mode are:
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
For <literal>gzip</literal> the value should be an integer between 1
- and 9, and for <literal>lz4</literal> it should be between 1 and 12.
+ and 9, for <literal>lz4</literal> between 1 and 12, and for
+ <literal>zstd</literal> it should be between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd1..4cf28a2a61 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -417,30 +417,32 @@ PostgreSQL documentation
specify <literal>-Xfetch</literal>.
</para>
<para>
- The compression method can be set to <literal>gzip</literal> or
- <literal>lz4</literal>, or <literal>none</literal> for no
- compression. A compression level can be optionally specified, by
- appending the level number after a colon (<literal>:</literal>). If no
- level is specified, the default compression level will be used. If
- only a level is specified without mentioning an algorithm,
- <literal>gzip</literal> compression will be used if the level is
- greater than 0, and no compression will be used if the level is 0.
- </para>
- <para>
- When the tar format is used with <literal>gzip</literal> or
- <literal>lz4</literal>, the suffix <filename>.gz</filename> or
- <filename>.lz4</filename> will automatically be added to all tar
- filenames. When the plain format is used, client-side compression may
- not be specified, but it is still possible to request server-side
- compression. If this is done, the server will compress the backup for
- transmission, and the client will decompress and extract it.
+ The compression method can be set to <literal>gzip</literal>,
+ <literal>lz4</literal>, <literal>zstd</literal>, or
+ <literal>none</literal> for no compression. A compression level can
+ optionally be specified, by appending the level number after a colon
+ (<literal>:</literal>). If no level is specified, the default
+ compression level will be used. If only a level is specified without
+ mentioning an algorithm, <literal>gzip</literal> compression will be
+ used if the level is greater than 0, and no compression will be used if
+ the level is 0.
+ </para>
+ <para>
+ When the tar format is used with <literal>gzip</literal>,
+ <literal>lz4</literal>, or <literal>zstd</literal>, the suffix
+ <filename>.gz</filename>, <filename>.lz4</filename>, or
+ <filename>.zst</filename> respectively will be automatically added to
+ all tar filenames. When the plain format is used, client-side
+ compression may not be specified, but it is still possible to request
+ server-side compression. If this is done, the server will compress the
+ backup for transmission, and the client will decompress and extract it.
</para>
<para>
When this option is used in combination with
<literal>-Xstream</literal>, <literal>pg_wal.tar</literal> will
be compressed using <literal>gzip</literal> if client-side gzip
compression is selected, but will not be compressed if server-side
- compresion or LZ4 compresion is selected.
+ compression, LZ4, or ZSTD compression is selected.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 74043ff331..2e6de7007f 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -20,6 +20,7 @@ OBJS = \
basebackup_copy.o \
basebackup_gzip.o \
basebackup_lz4.o \
+ basebackup_zstd.o \
basebackup_progress.o \
basebackup_server.o \
basebackup_sink.o \
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0bf28b55d7..2378ce5c5e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -64,7 +64,8 @@ typedef enum
{
BACKUP_COMPRESSION_NONE,
BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
} basebackup_compression_type;
typedef struct
@@ -906,6 +907,8 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression = BACKUP_COMPRESSION_GZIP;
else if (strcmp(optval, "lz4") == 0)
opt->compression = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(optval, "zstd") == 0)
+ opt->compression = BACKUP_COMPRESSION_ZSTD;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1026,6 +1029,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
sink = bbsink_gzip_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink, opt.compression_level);
+ else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
+ sink = bbsink_zstd_new(sink, opt.compression_level);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
new file mode 100644
index 0000000000..24993a5bb6
--- /dev/null
+++ b/src/backend/replication/basebackup_zstd.c
@@ -0,0 +1,294 @@
+/*-------------------------------------------------------------------------
+ *
+ * basebackup_zstd.c
+ * Basebackup sink implementing zstd compression.
+ *
+ * Portions Copyright (c) 2010-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/basebackup_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/basebackup_sink.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbsink_zstd
+{
+ /* Common information for all types of sink. */
+ bbsink base;
+
+ /* Compression level */
+ int compresslevel;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbsink_zstd;
+
+static void bbsink_zstd_begin_backup(bbsink *sink);
+static void bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name);
+static void bbsink_zstd_archive_contents(bbsink *sink, size_t avail_in);
+static void bbsink_zstd_manifest_contents(bbsink *sink, size_t len);
+static void bbsink_zstd_end_archive(bbsink *sink);
+static void bbsink_zstd_cleanup(bbsink *sink);
+static void bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli);
+
+const bbsink_ops bbsink_zstd_ops = {
+ .begin_backup = bbsink_zstd_begin_backup,
+ .begin_archive = bbsink_zstd_begin_archive,
+ .archive_contents = bbsink_zstd_archive_contents,
+ .end_archive = bbsink_zstd_end_archive,
+ .begin_manifest = bbsink_forward_begin_manifest,
+ .manifest_contents = bbsink_zstd_manifest_contents,
+ .end_manifest = bbsink_forward_end_manifest,
+ .end_backup = bbsink_zstd_end_backup,
+ .cleanup = bbsink_zstd_cleanup
+};
+#endif
+
+/*
+ * Create a new basebackup sink that performs zstd compression using the
+ * designated compression level.
+ */
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression is not supported by this build")));
+ return NULL; /* keep compiler quiet */
+#else
+ bbsink_zstd *sink;
+
+ Assert(next != NULL);
+
+ if (compresslevel < 0 || compresslevel > 22)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("zstd compression level %d is out of range",
+ compresslevel)));
+
+ sink = palloc0(sizeof(bbsink_zstd));
+ *((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
+ sink->base.bbs_next = next;
+ sink->compresslevel = compresslevel;
+
+ return &sink->base;
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+
+/*
+ * Begin backup.
+ */
+static void
+bbsink_zstd_begin_backup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t output_buffer_bound;
+
+ mysink->cctx = ZSTD_createCCtx();
+ if (!mysink->cctx)
+ elog(ERROR, "could not create zstd compression context");
+
+ ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+
+ /*
+ * We need our own buffer, because we're going to pass different data to
+ * the next sink than what gets passed to us.
+ */
+ mysink->base.bbs_buffer = palloc(mysink->base.bbs_buffer_length);
+
+ /*
+ * Make sure that the next sink's bbs_buffer is big enough to accommodate
+ * the compressed input buffer.
+ */
+ output_buffer_bound = ZSTD_compressBound(mysink->base.bbs_buffer_length);
+
+ /*
+ * The buffer length is expected to be a multiple of BLCKSZ, so round up.
+ */
+ output_buffer_bound = output_buffer_bound + BLCKSZ -
+ (output_buffer_bound % BLCKSZ);
+
+ bbsink_begin_backup(sink->bbs_next, sink->bbs_state, output_buffer_bound);
+}
+
+/*
+ * Prepare to compress the next archive.
+ */
+static void
+bbsink_zstd_begin_archive(bbsink *sink, const char *archive_name)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ char *zstd_archive_name;
+
+ /*
+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they will stick
+ * around as we are resetting with option ZSTD_reset_session_only.
+ */
+ ZSTD_CCtx_reset(mysink->cctx, ZSTD_reset_session_only);
+
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+
+ /* Add ".zst" to the archive name. */
+ zstd_archive_name = psprintf("%s.zst", archive_name);
+ Assert(sink->bbs_next != NULL);
+ bbsink_begin_archive(sink->bbs_next, zstd_archive_name);
+ pfree(zstd_archive_name);
+}
+
+/*
+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for the next sink.
+ *
+ * Note that since we're compressing the input, it may very commonly happen
+ * that we consume all the input data without filling the output buffer. In
+ * that case, the compressed representation of the current input data won't
+ * actually be sent to the next bbsink until a later call to this function,
+ * or perhaps even not until bbsink_zstd_end_archive() is invoked.
+ */
+static void
+bbsink_zstd_archive_contents(bbsink *sink, size_t len)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ ZSTD_inBuffer inBuf = {mysink->base.bbs_buffer, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx, &mysink->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * There might be some data inside zstd's internal buffers; we need to get that
+ * flushed out, also end the zstd frame and then get that forwarded to the
+ * successor sink as archive content.
+ *
+ * Then we can end processing for this archive.
+ */
+static void
+bbsink_zstd_end_archive(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the out buffer is not left with enough space, send the output
+ * buffer to the next sink, and reset it.
+ */
+ if ((mysink->zstd_outBuf.size - mysink->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+ mysink->zstd_outBuf.dst = mysink->base.bbs_next->bbs_buffer;
+ mysink->zstd_outBuf.size = mysink->base.bbs_next->bbs_buffer_length;
+ mysink->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mysink->cctx,
+ &mysink->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ elog(ERROR, "could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next sink. */
+ if (mysink->zstd_outBuf.pos > 0)
+ bbsink_archive_contents(mysink->base.bbs_next, mysink->zstd_outBuf.pos);
+
+ /* Pass on the information that this archive has ended. */
+ bbsink_forward_end_archive(sink);
+}
+
+/*
+ * Free the resources and context.
+ */
+static void
+bbsink_zstd_end_backup(bbsink *sink, XLogRecPtr endptr,
+ TimeLineID endtli)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+
+ bbsink_forward_end_backup(sink, endptr, endtli);
+}
+
+/*
+ * Manifest contents are not compressed, but we do need to copy them into
+ * the successor sink's buffer, because we have our own.
+ */
+static void
+bbsink_zstd_manifest_contents(bbsink *sink, size_t len)
+{
+ memcpy(sink->bbs_next->bbs_buffer, sink->bbs_buffer, len);
+ bbsink_manifest_contents(sink->bbs_next, len);
+}
+
+/*
+ * In case the backup fails, make sure we free the compression context by
+ * calling ZSTD_freeCCtx if needed to avoid memory leak.
+ */
+static void
+bbsink_zstd_cleanup(bbsink *sink)
+{
+ bbsink_zstd *mysink = (bbsink_zstd *) sink;
+
+ /* Release the context if not already released. */
+ if (mysink->cctx)
+ {
+ ZSTD_freeCCtx(mysink->cctx);
+ mysink->cctx = NULL;
+ }
+}
+
+#endif
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 1d0db4f9d0..0035ebcef5 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -44,6 +44,7 @@ BBOBJS = \
bbstreamer_gzip.o \
bbstreamer_inject.o \
bbstreamer_lz4.o \
+ bbstreamer_zstd.o \
bbstreamer_tar.o
all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index c2de77bacc..02d4c05df6 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -209,6 +209,9 @@ extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
int compresslevel);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
new file mode 100644
index 0000000000..83b59d63ba
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -0,0 +1,335 @@
+/*-------------------------------------------------------------------------
+ *
+ * bbstreamer_zstd.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_basebackup/bbstreamer_zstd.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#ifdef HAVE_LIBZSTD
+#include <zstd.h>
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+
+#ifdef HAVE_LIBZSTD
+
+typedef struct bbstreamer_zstd_frame
+{
+ bbstreamer base;
+
+ ZSTD_CCtx *cctx;
+ ZSTD_DCtx *dctx;
+ ZSTD_outBuffer zstd_outBuf;
+} bbstreamer_zstd_frame;
+
+static void bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_compressor_ops = {
+ .content = bbstreamer_zstd_compressor_content,
+ .finalize = bbstreamer_zstd_compressor_finalize,
+ .free = bbstreamer_zstd_compressor_free
+};
+
+static void bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context);
+static void bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_zstd_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
+ .content = bbstreamer_zstd_decompressor_content,
+ .finalize = bbstreamer_zstd_decompressor_finalize,
+ .free = bbstreamer_zstd_decompressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs zstd compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_compressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, ZSTD_DStreamOutSize());
+
+ streamer->cctx = ZSTD_createCCtx();
+ if (!streamer->cctx)
+ pg_log_error("could not create zstd compression context");
+
+ /* Initialize stream compression preferences */
+ ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compresslevel);
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support zstd compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Compress the input data to output buffer.
+ *
+ * Find out the compression bound based on input data length for each
+ * invocation to make sure that output buffer has enough capacity to
+ * accommodate the compressed data. In case if the output buffer
+ * capacity falls short of compression bound then forward the content
+ * of output buffer to next streamer and empty the buffer.
+ */
+static void
+bbstreamer_zstd_compressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t yet_to_flush;
+ size_t required_outBuf_bound = ZSTD_compressBound(inBuf.size - inBuf.pos);
+
+ /*
+ * If the output buffer is not left with enough space, send the
+ * compressed bytes to the next streamer, and empty the buffer.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx, &mystreamer->zstd_outBuf,
+ &inBuf, ZSTD_e_continue);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_compressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ size_t yet_to_flush;
+
+ do
+ {
+ ZSTD_inBuffer in = {NULL, 0, 0};
+ size_t required_outBuf_bound = ZSTD_compressBound(0);
+
+ /*
+ * If the output buffer is not left with enough space, send the
+ * compressed bytes to the next streamer, and empty the buffer.
+ */
+ if ((mystreamer->zstd_outBuf.size - mystreamer->zstd_outBuf.pos) <=
+ required_outBuf_bound)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ yet_to_flush = ZSTD_compressStream2(mystreamer->cctx,
+ &mystreamer->zstd_outBuf,
+ &in, ZSTD_e_end);
+
+ if (ZSTD_isError(yet_to_flush))
+ pg_log_error("could not compress data: %s", ZSTD_getErrorName(yet_to_flush));
+
+ } while (yet_to_flush > 0);
+
+ /* Make sure to pass any remaining bytes to the next streamer. */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_compressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeCCtx(mystreamer->cctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
+
+/*
+ * Create a new base backup streamer that performs decompression of zstd
+ * compressed blocks.
+ */
+bbstreamer *
+bbstreamer_zstd_decompressor_new(bbstreamer *next)
+{
+#ifdef HAVE_LIBZSTD
+ bbstreamer_zstd_frame *streamer;
+
+ Assert(next != NULL);
+
+ streamer = palloc0(sizeof(bbstreamer_zstd_frame));
+ *((const bbstreamer_ops **) &streamer->base.bbs_ops) =
+ &bbstreamer_zstd_decompressor_ops;
+
+ streamer->base.bbs_next = next;
+ initStringInfo(&streamer->base.bbs_buffer);
+ enlargeStringInfo(&streamer->base.bbs_buffer, ZSTD_DStreamOutSize());
+
+ streamer->dctx = ZSTD_createDCtx();
+ if (!streamer->dctx)
+ {
+ pg_log_error("could not create zstd decompression context");
+ exit(1);
+ }
+
+ /* Initialize the ZSTD output buffer. */
+ streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
+ streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
+ streamer->zstd_outBuf.pos = 0;
+
+ return &streamer->base;
+#else
+ pg_log_error("this build does not support compression");
+ exit(1);
+#endif
+}
+
+#ifdef HAVE_LIBZSTD
+/*
+ * Decompress the input data to output buffer until we run out of input
+ * data. Each time the output buffer is full, pass on the decompressed data
+ * to the next streamer.
+ */
+static void
+bbstreamer_zstd_decompressor_content(bbstreamer *streamer,
+ bbstreamer_member *member,
+ const char *data, int len,
+ bbstreamer_archive_context context)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+ ZSTD_inBuffer inBuf = {data, len, 0};
+
+ while (inBuf.pos < inBuf.size)
+ {
+ size_t ret;
+
+ /*
+ * If output buffer is full then forward the content to next streamer
+ * and update the output buffer.
+ */
+ if (mystreamer->zstd_outBuf.pos >= mystreamer->zstd_outBuf.size)
+ {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
+ mystreamer->zstd_outBuf.dst,
+ mystreamer->zstd_outBuf.pos,
+ context);
+
+ /* Reset the ZSTD output buffer. */
+ mystreamer->zstd_outBuf.dst = mystreamer->base.bbs_buffer.data;
+ mystreamer->zstd_outBuf.size = mystreamer->base.bbs_buffer.maxlen;
+ mystreamer->zstd_outBuf.pos = 0;
+ }
+
+ ret = ZSTD_decompressStream(mystreamer->dctx,
+ &mystreamer->zstd_outBuf, &inBuf);
+
+ if (ZSTD_isError(ret))
+ pg_log_error("could not decompress data: %s", ZSTD_getErrorName(ret));
+ }
+}
+
+/*
+ * End-of-stream processing.
+ */
+static void
+bbstreamer_zstd_decompressor_finalize(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ if (mystreamer->zstd_outBuf.pos > 0)
+ bbstreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->base.bbs_buffer.maxlen,
+ BBSTREAMER_UNKNOWN);
+
+ bbstreamer_finalize(mystreamer->base.bbs_next);
+}
+
+/*
+ * Free memory.
+ */
+static void
+bbstreamer_zstd_decompressor_free(bbstreamer *streamer)
+{
+ bbstreamer_zstd_frame *mystreamer = (bbstreamer_zstd_frame *) streamer;
+
+ bbstreamer_free(streamer->bbs_next);
+ ZSTD_freeDCtx(mystreamer->dctx);
+ pfree(streamer->bbs_buffer.data);
+ pfree(streamer);
+}
+#endif
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index c1ed7aeeee..9f3ecc60fb 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -405,8 +405,9 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
+ printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"
" compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=none do not compress tar output\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
" set fast or spread checkpointing\n"));
@@ -1067,6 +1068,21 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
*methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_SERVER;
}
+ else if (pg_strcasecmp(firstpart, "zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ }
+ else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_CLIENT;
+ }
+ else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
+ {
+ *methodres = COMPRESSION_ZSTD;
+ *locationres = COMPRESS_LOCATION_SERVER;
+ }
else if (pg_strcasecmp(firstpart, "none") == 0)
{
*methodres = COMPRESSION_NONE;
@@ -1191,7 +1207,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool inject_manifest;
bool is_tar,
is_tar_gz,
- is_tar_lz4;
+ is_tar_lz4,
+ is_tar_zstd;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1214,6 +1231,10 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_lz4 = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+ /* Is this a ZSTD archive? */
+ is_tar_zstd = (archive_name_len > 8 &&
+ strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1223,7 +1244,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4)
+ if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4
+ && !is_tar_zstd)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
@@ -1295,6 +1317,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_lz4_compressor_new(streamer,
compresslevel);
}
+ else if (compressmethod == COMPRESSION_ZSTD)
+ {
+ strlcat(archive_filename, ".zst", sizeof(archive_filename));
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
+ streamer = bbstreamer_zstd_compressor_new(streamer,
+ compresslevel);
+ }
else
{
Assert(false); /* not reachable */
@@ -1353,6 +1383,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_gzip_decompressor_new(streamer);
else if (compressmethod == COMPRESSION_LZ4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
+ else if (compressmethod == COMPRESSION_ZSTD)
+ streamer = bbstreamer_zstd_decompressor_new(streamer);
}
/* Return the results. */
@@ -2020,6 +2052,9 @@ BaseBackup(void)
case COMPRESSION_LZ4:
compressmethodstr = "lz4";
break;
+ case COMPRESSION_ZSTD:
+ compressmethodstr = "zstd";
+ break;
default:
Assert(false);
break;
@@ -2869,6 +2904,14 @@ main(int argc, char **argv)
exit(1);
}
break;
+ case COMPRESSION_ZSTD:
+ if (compresslevel > 22)
+ {
+ pg_log_error("compression level %d of method %s higher than maximum of 22",
+ compresslevel, "zstd");
+ exit(1);
+ }
+ break;
}
/*
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ce661a9ce4..8a4c2b8964 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -904,6 +904,10 @@ main(int argc, char **argv)
exit(1);
#endif
break;
+ case COMPRESSION_ZSTD:
+ pg_log_error("compression with %s is not yet supported", "ZSTD");
+ exit(1);
+
}
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 2dfb353baa..ec54019cfc 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -24,6 +24,7 @@ typedef enum
{
COMPRESSION_GZIP,
COMPRESSION_LZ4,
+ COMPRESSION_ZSTD,
COMPRESSION_NONE
} WalCompressionMethod;
diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 851233a6e0..596df15118 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -10,6 +10,7 @@ export TAR
# name.
export GZIP_PROGRAM=$(GZIP)
export LZ4=$(LZ4)
+export ZSTD=$(ZSTD)
subdir = src/bin/pg_verifybackup
top_builddir = ../../..
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 383203d0b8..efbc910dfb 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -42,6 +42,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d', '-m'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
@@ -107,6 +115,7 @@ for my $tc (@test_configuration)
# Cleanup.
unlink($backup_path . '/backup_manifest');
unlink($backup_path . '/base.tar');
+ unlink($backup_path . '/' . $tc->{'backup_archive'});
rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index c51cdf79f8..d30ba01742 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -31,6 +31,11 @@ my @test_configuration = (
'compression_method' => 'lz4',
'backup_flags' => ['--compress', 'server-lz4:5'],
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:5'],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 3616529390..c2a6161be6 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -42,6 +42,14 @@ my @test_configuration = (
'decompress_flags' => [ '-d' ],
'output_file' => 'base.tar',
'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:5'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a3f8d37258..a7f16758a4 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,6 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 105f5c72a2..441d6ae6bf 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -380,6 +380,7 @@ sub mkvcbuild
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_gzip.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_inject.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_lz4.c');
+ $pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_zstd.c');
$pgbasebackup->AddFile('src/bin/pg_basebackup/bbstreamer_tar.c');
$pgbasebackup->AddLibrary('ws2_32.lib');
--
2.24.3 (Apple Git-128)
Hi Robert,
My proposed changes are largely cosmetic, but one thing that isn't is
revising the size - pos <= bound tests to instead check size - pos <
bound. My reasoning for that change is: if the number of bytes
remaining in the buffer is exactly equal to the maximum number we can
write, we don't need to flush it yet. If that sounds correct, we
should fix the LZ4 code the same way.
I agree with your patch. The patch looks good to me.
Yes, the LZ4 flush check should also be fixed. Please find the attached
patch to fix the LZ4 code.
Regards,
Jeevan Ladhe
Attachments:
fix_lz4_flush_logic.patchapplication/octet-stream; name=fix_lz4_flush_logic.patchDownload
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index d26032783c..472b620d7c 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -193,7 +193,7 @@ bbsink_lz4_archive_contents(bbsink *sink, size_t avail_in)
* LZ4F_compressBound(), ask the next sink to process the data so that we
* can empty the buffer.
*/
- if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <
avail_in_bound)
{
bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
@@ -238,7 +238,7 @@ bbsink_lz4_end_archive(bbsink *sink)
Assert(mysink->base.bbs_next->bbs_buffer_length >= lz4_footer_bound);
- if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <=
+ if ((mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written) <
lz4_footer_bound)
{
bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index f0bc226bf8..bde018246f 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -99,7 +99,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
compressed_bound = LZ4F_compressBound(streamer->base.bbs_buffer.maxlen, prefs);
/* Enlarge buffer if it falls short of compression bound. */
- if (streamer->base.bbs_buffer.maxlen <= compressed_bound)
+ if (streamer->base.bbs_buffer.maxlen < compressed_bound)
enlargeStringInfo(&streamer->base.bbs_buffer, compressed_bound);
ctxError = LZ4F_createCompressionContext(&streamer->cctx, LZ4F_VERSION);
@@ -170,7 +170,7 @@ bbstreamer_lz4_compressor_content(bbstreamer *streamer,
*/
out_bound = LZ4F_compressBound(len, &mystreamer->prefs);
Assert(mystreamer->base.bbs_buffer.maxlen >= out_bound);
- if (avail_out <= out_bound)
+ if (avail_out < out_bound)
{
bbstreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
@@ -218,7 +218,7 @@ bbstreamer_lz4_compressor_finalize(bbstreamer *streamer)
/* Find out the footer bound and update the output buffer. */
footer_bound = LZ4F_compressBound(0, &mystreamer->prefs);
Assert(mystreamer->base.bbs_buffer.maxlen >= footer_bound);
- if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <=
+ if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <
footer_bound)
{
bbstreamer_content(mystreamer->base.bbs_next, NULL,
On Tue, Mar 8, 2022 at 4:49 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
I agree with your patch. The patch looks good to me.
Yes, the LZ4 flush check should also be fixed. Please find the attached
patch to fix the LZ4 code.
OK, committed all that stuff.
I think we also need to fix one other thing. Right now, for LZ4
support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
testing USE_LZ4, so I think we should be doing the same here. And
similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.
Patch for that attached.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
fix-symbol-tests.patchapplication/octet-stream; name=fix-symbol-tests.patchDownload
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index 472b620d7c..d838f723d0 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -12,13 +12,13 @@
*/
#include "postgres.h"
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
#include <lz4frame.h>
#endif
#include "replication/basebackup_sink.h"
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
typedef struct bbsink_lz4
{
@@ -62,7 +62,7 @@ const bbsink_ops bbsink_lz4_ops = {
bbsink *
bbsink_lz4_new(bbsink *next, int compresslevel)
{
-#ifndef HAVE_LIBLZ4
+#ifndef USE_LZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("lz4 compression is not supported by this build")));
@@ -87,7 +87,7 @@ bbsink_lz4_new(bbsink *next, int compresslevel)
#endif
}
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
/*
* Begin backup.
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index e3f9b1d4dc..c0e2be6e27 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -12,13 +12,13 @@
*/
#include "postgres.h"
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
#include <zstd.h>
#endif
#include "replication/basebackup_sink.h"
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
typedef struct bbsink_zstd
{
@@ -61,7 +61,7 @@ const bbsink_ops bbsink_zstd_ops = {
bbsink *
bbsink_zstd_new(bbsink *next, int compresslevel)
{
-#ifndef HAVE_LIBZSTD
+#ifndef USE_ZSTD
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("zstd compression is not supported by this build")));
@@ -86,7 +86,7 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
#endif
}
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
/*
* Begin backup.
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index bde018246f..810052e4e3 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -13,7 +13,7 @@
#include <unistd.h>
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
#include <lz4frame.h>
#endif
@@ -22,7 +22,7 @@
#include "common/file_perm.h"
#include "common/string.h"
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
typedef struct bbstreamer_lz4_frame
{
bbstreamer base;
@@ -69,7 +69,7 @@ const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
bbstreamer *
bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
{
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
LZ4F_preferences_t *prefs;
@@ -114,7 +114,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
#endif
}
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
/*
* Compress the input data to output buffer.
*
@@ -280,7 +280,7 @@ bbstreamer_lz4_compressor_free(bbstreamer *streamer)
bbstreamer *
bbstreamer_lz4_decompressor_new(bbstreamer *next)
{
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
LZ4F_errorCode_t ctxError;
@@ -309,7 +309,7 @@ bbstreamer_lz4_decompressor_new(bbstreamer *next)
#endif
}
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
/*
* Decompress the input data to output buffer until we run out of input
* data. Each time the output buffer is full, pass on the decompressed data
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index cc68367dd5..e86749a8fb 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -13,14 +13,14 @@
#include <unistd.h>
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
#include <zstd.h>
#endif
#include "bbstreamer.h"
#include "common/logging.h"
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
typedef struct bbstreamer_zstd_frame
{
@@ -65,7 +65,7 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
bbstreamer *
bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
{
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
Assert(next != NULL);
@@ -99,7 +99,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
#endif
}
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
/*
* Compress the input data to output buffer.
*
@@ -225,7 +225,7 @@ bbstreamer_zstd_compressor_free(bbstreamer *streamer)
bbstreamer *
bbstreamer_zstd_decompressor_new(bbstreamer *next)
{
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
Assert(next != NULL);
@@ -257,7 +257,7 @@ bbstreamer_zstd_decompressor_new(bbstreamer *next)
#endif
}
-#ifdef HAVE_LIBZSTD
+#ifdef USE_ZSTD
/*
* Decompress the input data to output buffer until we run out of input
* data. Each time the output buffer is full, pass on the decompressed data
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 8a4c2b8964..e2ceafeb0f 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -32,7 +32,7 @@
#include "receivelog.h"
#include "streamutil.h"
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
#include "lz4frame.h"
#endif
@@ -382,7 +382,7 @@ FindStreamingStart(uint32 *tli)
}
else if (!ispartial && wal_compression_method == COMPRESSION_LZ4)
{
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
#define LZ4_CHUNK_SZ 64 * 1024 /* 64kB as maximum chunk size read */
int fd;
ssize_t r;
@@ -889,7 +889,7 @@ main(int argc, char **argv)
#endif
break;
case COMPRESSION_LZ4:
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (compresslevel != 0)
{
pg_log_error("cannot use --compress with --compression-method=%s",
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 545618e0b2..8c38816b22 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -141,7 +141,7 @@ SKIP:
SKIP:
{
skip "postgres was not built with LZ4 support", 5
- if (!check_pg_config("#define HAVE_LIBLZ4 1"));
+ if (!check_pg_config("#define USE_LZ4 1"));
# Generate more WAL including one completed, compressed segment.
$primary->psql('postgres', 'SELECT pg_switch_wal();');
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index a6d08c1270..1e0ff760eb 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -18,7 +18,7 @@
#include <time.h>
#include <unistd.h>
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
#include <lz4frame.h>
#endif
#ifdef HAVE_LIBZ
@@ -70,7 +70,7 @@ typedef struct DirectoryMethodFile
#ifdef HAVE_LIBZ
gzFile gzfp;
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
LZ4F_compressionContext_t ctx;
size_t lz4bufsize;
void *lz4buf;
@@ -114,7 +114,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
#ifdef HAVE_LIBZ
gzFile gzfp = NULL;
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
LZ4F_compressionContext_t ctx = NULL;
size_t lz4bufsize = 0;
void *lz4buf = NULL;
@@ -160,7 +160,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
}
}
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (dir_data->compression_method == COMPRESSION_LZ4)
{
size_t ctx_out;
@@ -245,7 +245,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
gzclose(gzfp);
else
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (dir_data->compression_method == COMPRESSION_LZ4)
{
(void) LZ4F_compressEnd(ctx, lz4buf, lz4bufsize, NULL);
@@ -265,7 +265,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
if (dir_data->compression_method == COMPRESSION_GZIP)
f->gzfp = gzfp;
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (dir_data->compression_method == COMPRESSION_LZ4)
{
f->ctx = ctx;
@@ -306,7 +306,7 @@ dir_write(Walfile f, const void *buf, size_t count)
}
else
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (dir_data->compression_method == COMPRESSION_LZ4)
{
size_t chunk;
@@ -394,7 +394,7 @@ dir_close(Walfile f, WalCloseMethod method)
}
else
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (dir_data->compression_method == COMPRESSION_LZ4)
{
size_t compressed;
@@ -487,7 +487,7 @@ dir_close(Walfile f, WalCloseMethod method)
if (r != 0)
dir_data->lasterrno = errno;
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
pg_free(df->lz4buf);
/* supports free on NULL */
LZ4F_freeCompressionContext(df->ctx);
@@ -523,7 +523,7 @@ dir_sync(Walfile f)
}
}
#endif
-#ifdef HAVE_LIBLZ4
+#ifdef USE_LZ4
if (dir_data->compression_method == COMPRESSION_LZ4)
{
DirectoryMethodFile *df = (DirectoryMethodFile *) f;
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 3e55ff26f8..fd1052e5db 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3743,7 +3743,7 @@ if ($collation_check_stderr !~ /ERROR: /)
}
# Determine whether build supports LZ4.
-my $supports_lz4 = check_pg_config("#define HAVE_LIBLZ4 1");
+my $supports_lz4 = check_pg_config("#define USE_LZ4 1");
# Create additional databases for mutations of schema public
$node->psql('postgres', 'create database regress_pg_dump_test;');
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index efbc910dfb..9bfd7023fb 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -41,7 +41,7 @@ my @test_configuration = (
'backup_archive' => 'base.tar.lz4',
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d', '-m'],
- 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ 'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
@@ -49,7 +49,7 @@ my @test_configuration = (
'backup_archive' => 'base.tar.zst',
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
- 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
+ 'enabled' => check_pg_config("#define USE_ZSTD 1")
}
);
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index d30ba01742..9f9cc7540b 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -30,12 +30,12 @@ my @test_configuration = (
{
'compression_method' => 'lz4',
'backup_flags' => ['--compress', 'server-lz4:5'],
- 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ 'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
'backup_flags' => ['--compress', 'server-zstd:5'],
- 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
+ 'enabled' => check_pg_config("#define USE_ZSTD 1")
}
);
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index c2a6161be6..16a752195e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -41,7 +41,7 @@ my @test_configuration = (
'decompress_program' => $ENV{'LZ4'},
'decompress_flags' => [ '-d' ],
'output_file' => 'base.tar',
- 'enabled' => check_pg_config("#define HAVE_LIBLZ4 1")
+ 'enabled' => check_pg_config("#define USE_LZ4 1")
},
{
'compression_method' => 'zstd',
@@ -49,7 +49,7 @@ my @test_configuration = (
'backup_archive' => 'base.tar.zst',
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
- 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
+ 'enabled' => check_pg_config("#define USE_ZSTD 1")
}
);
OK, committed all that stuff.
Thanks for the commit Robert.
I think we also need to fix one other thing. Right now, for LZ4
support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
testing USE_LZ4, so I think we should be doing the same here. And
similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.
I reviewed the patch, and it seems to be capturing and replacing all the
places of HAVE_LIB* with USE_* correctly.
Just curious, apart from consistency, do you see other problems as well
when testing one vs the other?
Regards,
Jeevan Ladhe
On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
I reviewed the patch, and it seems to be capturing and replacing all the
places of HAVE_LIB* with USE_* correctly.
Just curious, apart from consistency, do you see other problems as well
when testing one vs the other?
So, the kind of problem you would worry about in a case like this is:
suppose that configure detects LIBLZ4, but the user specifies
--without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
true, while USE_LIBLZ4 is false, and therefore we should not be
compiling code that uses LZ4 but do anyway. As configure.ac is
currently coded, I think that's impossible, because we only search for
liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
will be set. Therefore, I don't think there is a live problem here,
just an inconsistency.
Probably still best to clean it up before an angry Andres chases me
down, since I know he's working on the build system...
--
Robert Haas
EDB: http://www.enterprisedb.com
ok got it. Thanks for your insights.
Regards,
Jeevan Ladhe
On Tue, 8 Mar 2022 at 22:23, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe <jeevanladhe.os@gmail.com>
wrote:I reviewed the patch, and it seems to be capturing and replacing all the
places of HAVE_LIB* with USE_* correctly.
Just curious, apart from consistency, do you see other problems as well
when testing one vs the other?So, the kind of problem you would worry about in a case like this is:
suppose that configure detects LIBLZ4, but the user specifies
--without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
true, while USE_LIBLZ4 is false, and therefore we should not be
compiling code that uses LZ4 but do anyway. As configure.ac is
currently coded, I think that's impossible, because we only search for
liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
will be set. Therefore, I don't think there is a live problem here,
just an inconsistency.Probably still best to clean it up before an angry Andres chases me
down, since I know he's working on the build system...--
Robert Haas
EDB: http://www.enterprisedb.com
I'm getting errors from pg_basebackup when using both -D- and --compress=server-*
The issue seems to go away if I use --no-manifest.
$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo $?
pg_basebackup: error: tar member has empty name
1
$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo $?
NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
pg_basebackup: error: COPY stream ended before last file was finished
1
On Thu, Mar 10, 2022 at 8:02 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
I'm getting errors from pg_basebackup when using both -D- and --compress=server-*
The issue seems to go away if I use --no-manifest.$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo $?
pg_basebackup: error: tar member has empty name
1$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo $?
NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
pg_basebackup: error: COPY stream ended before last file was finished
1
Thanks for the report. The problem here is that, when the output is
standard output (-D -), pg_basebackup can only produce a single output
file, so the manifest gets injected into the tar file on the client
side rather than being written separately as we do in normal cases.
However, that only works if we're receiving a tar file that we can
parse from the server, and here the server is sending a compressed
tarfile. The current code mistakely attempts to parse the compressed
tarfile as if it were an uncompressed tarfile, which causes the error
messages that you are seeing (and which I can also reproduce here). We
actually have enough infrastructure available in pg_basebackup now
that we could do the "right thing" in this case: decompress the data
received from the server, parse the resulting tar file, inject the
backup manifest, construct a new tar file, and recompress. However, I
think that's probably not a good idea, because it's unlikely that the
user will understand that the data is being compressed on the server,
then decompressed, and then recompressed again, and the performance of
the resulting pipeline will probably not be very good. So I think we
should just refuse this command. Patch for that attached.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
reject-compressed-inject.patchapplication/octet-stream; name=reject-compressed-inject.patchDownload
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9f3ecc60fb..43c4036eee 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1208,7 +1208,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
bool is_tar,
is_tar_gz,
is_tar_lz4,
- is_tar_zstd;
+ is_tar_zstd,
+ is_compressed_tar;
bool must_parse_archive;
int archive_name_len = strlen(archive_name);
@@ -1235,6 +1236,24 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar_zstd = (archive_name_len > 8 &&
strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+ /* Is this any kind of compressed tar? */
+ is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+
+ /*
+ * Injecting the manifest into a compressed tar file would be possible if
+ * we decompressed it, parsed the tarfile, generated a new tarfile, and
+ * recompressed it, but compressing and decompressing multiple times just
+ * to inject the manifest seems inefficient enough that it's probably not
+ * what the user wants. So, instead, reject the request and tell the user
+ * to specify something more reasonable.
+ */
+ if (inject_manifest && is_compressed_tar)
+ {
+ pg_log_error("cannot inject manifest into a compressed tarfile");
+ pg_log_info("use client-side compression, send the output to a directory rather than standard output, or use --no-manifest");
+ exit(1);
+ }
+
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
* (2) we need to inject backup_manifest or recovery configuration into it.
@@ -1244,8 +1263,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_tar_gz && !is_tar_lz4
- && !is_tar_zstd)
+ if (must_parse_archive && !is_tar && !is_compressed_tar)
{
pg_log_error("unable to parse archive: %s", archive_name);
pg_log_info("only tar archives can be parsed");
On Fri, Mar 11, 2022 at 10:19:29AM -0500, Robert Haas wrote:
So I think we should just refuse this command. Patch for that attached.
Sounds right.
Also, I think the magic 8 for .gz should actually be a 7.
I'm not sure why it tests for ".gz" but not ".tar.gz", which would help to make
them all less magic.
commit 1fb1e21ba7a500bb2b85ec3e65f59130fcdb4a7e
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu Mar 10 21:22:16 2022 -0600
pg_basebackup: make magic numbers less magic
The magic 8 for .gz should actually be a 7.
.tar.gz
1234567
.tar.lz4
.tar.zst
12345678
See d45099425, 751b8d23b, 7cf085f07.
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9f3ecc60fbe..8dd9721323d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1223,17 +1223,17 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
is_tar = (archive_name_len > 4 &&
strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
- /* Is this a gzip archive? */
- is_tar_gz = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+ /* Is this a .tar.gz archive? */
+ is_tar_gz = (archive_name_len > 7 &&
+ strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
- /* Is this a LZ4 archive? */
+ /* Is this a .tar.lz4 archive? */
is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+ strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
- /* Is this a ZSTD archive? */
+ /* Is this a .tar.zst archive? */
is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+ strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
/*
* We have to parse the archive if (1) we're suppose to extract it, or if
On Fri, Mar 11, 2022 at 11:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Sounds right.
OK, committed.
Also, I think the magic 8 for .gz should actually be a 7.
I'm not sure why it tests for ".gz" but not ".tar.gz", which would help to make
them all less magic.commit 1fb1e21ba7a500bb2b85ec3e65f59130fcdb4a7e
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu Mar 10 21:22:16 2022 -0600
Yeah, your patch looks right. Committed that, too.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Feb 15, 2022 at 11:26 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 9, 2022 at 8:41 AM Abhijit Menon-Sen <ams@toroid.org> wrote:
It took me a while to assimilate these patches, including the backup
targets one, which I hadn't looked at before. Now that I've wrapped my
head around how to put the pieces together, I really like the idea. As
you say, writing non-trivial integrations in C will take some effort,
but it seems worthwhile. It's also nice that one can continue to use
pg_basebackup to trigger the backups and see progress information.Cool. Thanks for having a look.
Yes, it looks simple to follow the example set by basebackup_to_shell to
write a custom target. The complexity will be in whatever we need to do
to store/forward the backup data, rather than in obtaining the data in
the first place, which is exactly as it should be.Yeah, that's what made me really happy with how this came out.
Here's v2, rebased and with documentation added.
I don't hear many comments on this, but I'm pretty sure that it's a
good idea, and there haven't been many objections to this patch series
as a whole, so I'd like to proceed with it. If nobody objects
vigorously, I'll commit this next week.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2022-03-11 10:19:29 -0500, Robert Haas wrote:
Thanks for the report. The problem here is that, when the output is
standard output (-D -), pg_basebackup can only produce a single output
file, so the manifest gets injected into the tar file on the client
side rather than being written separately as we do in normal cases.
However, that only works if we're receiving a tar file that we can
parse from the server, and here the server is sending a compressed
tarfile. The current code mistakely attempts to parse the compressed
tarfile as if it were an uncompressed tarfile, which causes the error
messages that you are seeing (and which I can also reproduce here). We
actually have enough infrastructure available in pg_basebackup now
that we could do the "right thing" in this case: decompress the data
received from the server, parse the resulting tar file, inject the
backup manifest, construct a new tar file, and recompress. However, I
think that's probably not a good idea, because it's unlikely that the
user will understand that the data is being compressed on the server,
then decompressed, and then recompressed again, and the performance of
the resulting pipeline will probably not be very good. So I think we
should just refuse this command. Patch for that attached.
You could also just append a manifest as a compresed tar to the compressed tar
stream. Unfortunately GNU tar requires -i to read concated compressed
archives, so perhaps that's not quite an alternative.
Greetings,
Andres Freund
On Fri, Mar 11, 2022 at 8:52 PM Andres Freund <andres@anarazel.de> wrote:
You could also just append a manifest as a compresed tar to the compressed tar
stream. Unfortunately GNU tar requires -i to read concated compressed
archives, so perhaps that's not quite an alternative.
s/Unfortunately/Fortunately/ :-p
I think we've already gone way too far in the direction of making this
stuff rely on specific details of the tar format. What if someday we
wanted to switch to pax, cpio, zip, 7zip, whatever, or even just have
one of those things as an option? It's not that I'm dying to have
PostgreSQL produce rar or arj files, but I think we box ourselves into
a corner when we just assume tar everywhere. As an example of a
similar issue with real consequences, consider the recent discovery
that we can't easily add support for LZ4 or ZSTD compression of
pg_wal.tar. The problem is that the existing code tells the gzip
library to emit the tar header as part of the compressed stream
without actually compressing it, and then it goes back and overwrites
that data later! Unsurprisingly, that's not a feature every
compression library offers.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
I tried to implement support for parallel ZSTD compression. The
library provides an option (ZSTD_c_nbWorkers) to specify the
number of compression workers. The number of parallel
workers can be set as part of compression parameter and if this
option is specified then the library performs parallel compression
based on the specified number of workers.
User can specify the number of parallel worker as part of
--compress option by appending an integer value after at sign (@).
(-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])
Please find the attached patch v1 with the above changes.
Note: ZSTD library version 1.5.x supports parallel compression
by default and if the library version is lower than 1.5.x then
parallel compression is enabled only the source is compiled with build
macro ZSTD_MULTITHREAD. If the linked library version doesn't
support parallel compression then setting the value of parameter
ZSTD_c_nbWorkers to a value other than 0 will be no-op and
returns an error.
Thanks,
Dipesh
Attachments:
v1-0001-support-parallel-zstd-compression.patchtext/x-patch; charset=US-ASCII; name=v1-0001-support-parallel-zstd-compression.patchDownload
From 688ad1e3f9b43bf911e8c3837497a874e4a6937f Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pandit@enterprisedb.com>
Date: Mon, 14 Mar 2022 18:39:02 +0530
Subject: [PATCH] support parallel zstd compression
---
doc/src/sgml/ref/pg_basebackup.sgml | 11 ++-
src/backend/replication/basebackup.c | 14 +++-
src/backend/replication/basebackup_zstd.c | 15 ++++-
src/bin/pg_basebackup/bbstreamer.h | 3 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 11 ++-
src/bin/pg_basebackup/pg_basebackup.c | 97 +++++++++++++++++++++------
src/bin/pg_verifybackup/t/008_untar.pl | 8 +++
src/bin/pg_verifybackup/t/010_client_untar.pl | 8 +++
src/include/replication/basebackup_sink.h | 2 +-
9 files changed, 142 insertions(+), 27 deletions(-)
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 4a630b5..87feca0 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -399,9 +399,9 @@ PostgreSQL documentation
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
- <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>][@<replaceable>workers</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>][@<replaceable>workers</replaceable>]</term>
<listitem>
<para>
Requests compression of the backup. If <literal>client</literal> or
@@ -428,6 +428,13 @@ PostgreSQL documentation
the level is 0.
</para>
<para>
+ Compression workers can be specified optionally by appending the
+ number of workers after an at sign (<literal>@</literal>). It
+ defines the degree of parallelism while compressing the archive.
+ Currently, parallel compression is supported only for
+ <literal>zstd</literal> compressed archives.
+ </para>
+ <para>
When the tar format is used with <literal>gzip</literal>,
<literal>lz4</literal>, or <literal>zstd</literal>, the suffix
<filename>.gz</filename>, <filename>.lz4</filename>, or
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 2378ce5..8217fa9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -82,6 +82,7 @@ typedef struct
backup_manifest_option manifest;
basebackup_compression_type compression;
int compression_level;
+ int compression_workers;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -718,6 +719,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
char *target_str = "compat"; /* placate compiler */
bool o_compression = false;
bool o_compression_level = false;
+ bool o_compression_workers = false;
MemSet(opt, 0, sizeof(*opt));
opt->target = BACKUP_TARGET_CLIENT;
@@ -925,6 +927,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->compression_level = defGetInt32(defel);
o_compression_level = true;
}
+ else if (strcmp(defel->defname, "compression_workers") == 0)
+ {
+ if (o_compression_workers)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("duplicate option \"%s\"", defel->defname)));
+ opt->compression_workers = defGetInt32(defel);
+ o_compression_workers = true;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1030,7 +1041,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
sink = bbsink_lz4_new(sink, opt.compression_level);
else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
- sink = bbsink_zstd_new(sink, opt.compression_level);
+ sink = bbsink_zstd_new(sink, opt.compression_level,
+ opt.compression_workers);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index e3f9b1d..54b91eb 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -28,6 +28,9 @@ typedef struct bbsink_zstd
/* Compression level */
int compresslevel;
+ /* Compression workers*/
+ int compressworkers;
+
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
} bbsink_zstd;
@@ -59,7 +62,7 @@ const bbsink_ops bbsink_zstd_ops = {
* designated compression level.
*/
bbsink *
-bbsink_zstd_new(bbsink *next, int compresslevel)
+bbsink_zstd_new(bbsink *next, int compresslevel, int compressworkers)
{
#ifndef HAVE_LIBZSTD
ereport(ERROR,
@@ -81,6 +84,7 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
sink->compresslevel = compresslevel;
+ sink->compressworkers = compressworkers;
return &sink->base;
#endif
@@ -96,6 +100,7 @@ bbsink_zstd_begin_backup(bbsink *sink)
{
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
+ size_t ret;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
@@ -104,6 +109,14 @@ bbsink_zstd_begin_backup(bbsink *sink)
ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
mysink->compresslevel);
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ mysink->compressworkers);
+
+ if (ZSTD_isError(ret))
+ elog(ERROR,
+ "could not compress data: %s",
+ ZSTD_getErrorName(ret));
+
/*
* We need our own buffer, because we're going to pass different data to
* the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index 02d4c05..dbaf6d6 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -210,7 +210,8 @@ extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
int compresslevel);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- int compresslevel);
+ int compresslevel,
+ int compressworkers);
extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index cc68367..f3d453e 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -63,10 +63,12 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel,
+ int compressworkers)
{
#ifdef HAVE_LIBZSTD
bbstreamer_zstd_frame *streamer;
+ size_t ret;
Assert(next != NULL);
@@ -87,6 +89,13 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
compresslevel);
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compressworkers);
+
+ if (ZSTD_isError(ret))
+ pg_log_error("could not compress data: %s",
+ ZSTD_getErrorName(ret));
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d265ee3..55a321d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -133,6 +133,7 @@ static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
static int compresslevel = 0;
+static int compressworkers = 0;
static WalCompressionMethod compressmethod = COMPRESSION_NONE;
static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
@@ -405,8 +406,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"
- " compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS]\n"
+ " compress tar output with given compression method or level or workers\n"));
printf(_(" -Z, --compress=none do not compress tar output\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -1005,29 +1006,36 @@ parse_max_rate(char *src)
/*
* Utility wrapper to parse the values specified for -Z/--compress.
- * *methodres and *levelres will be optionally filled with values coming
- * from the parsed results.
+ * *methodres, *levelres and *workerres will be optionally filled with values
+ * coming from the parsed result.
*/
static void
parse_compress_options(char *src, WalCompressionMethod *methodres,
- CompressionLocation *locationres, int *levelres)
+ CompressionLocation *locationres, int *levelres,
+ int *workerres)
{
char *sep;
- int firstlen;
- char *firstpart;
+ int firstlen,
+ secondlen;
+ char *firstpart,
+ *secondpart;
/*
- * clear 'levelres' so that if there are multiple compression options,
- * the last one fully overrides the earlier ones
+ * clear 'levelres' and 'workerres' so that if there are multiple
+ * compression options, the last one fully overrides the earlier ones.
*/
*levelres = 0;
+ *workerres = 0;
- /* check if the option is split in two */
+ /* check if the option is split in two using either ':' or '@'. */
sep = strchr(src, ':');
+ if (sep == NULL)
+ sep = strchr(src, '@');
+
/*
* The first part of the option value could be a method name, or just a
- * level value.
+ * level value or compression workers.
*/
firstlen = (sep != NULL) ? (sep - src) : strlen(src);
firstpart = pg_malloc(firstlen + 1);
@@ -1107,32 +1115,76 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
return;
}
+ /* Check for the second part of the input option. */
+ sep = strchr(src, ':');
+
+ if (sep != NULL)
+ {
+ /* Check the contents after the colon separator. */
+ sep++;
+ if (*sep == '\0')
+ {
+ pg_log_error("no compression level defined for method %s", firstpart);
+ exit(1);
+ }
+
+ /* Check if the option can be further split into two. */
+ src = sep;
+ sep = strchr(src, '@');
+
+ /* The second part of the value is compression level. */
+ secondlen = (sep != NULL) ? (sep - src) : strlen(src);
+ secondpart = pg_malloc(secondlen + 1);
+ memcpy(secondpart, src, secondlen);
+ secondpart[secondlen] = '\0';
+
+ /*
+ * For any of the methods currently supported, the data after the
+ * separator can just be an integer.
+ */
+ if (!option_parse_int(secondpart, "-Z/--compress", 0, INT_MAX,
+ levelres))
+ exit(1);
+
+ free(secondpart);
+ }
+
+ /* Check for the third part of the input option. */
+ sep = strchr(src, '@');
+
if (sep == NULL)
{
/*
- * The caller specified a method without a colon separator, so let any
- * subsequent checks assign a default level.
+ * The caller specified a method without a '@' separator, so let any
+ * subsequent checks assign a default number of workers.
*/
free(firstpart);
return;
}
- /* Check the contents after the colon separator. */
+ /* Check the contents after the '@' separator. */
sep++;
if (*sep == '\0')
{
- pg_log_error("no compression level defined for method %s", firstpart);
+ pg_log_error("compression workers are not defined for method %s", firstpart);
exit(1);
}
/*
- * For any of the methods currently supported, the data after the
- * separator can just be an integer.
+ * The data after '@' separator can just be an integer and it identifies
+ * the number of compression workers.
*/
if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
- levelres))
+ workerres))
exit(1);
+ if (*methodres != COMPRESSION_ZSTD)
+ {
+ pg_log_error("cannot use compression workers with method %s",
+ firstpart);
+ exit(1);
+ }
+
free(firstpart);
}
@@ -1341,7 +1393,8 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
streamer = bbstreamer_zstd_compressor_new(streamer,
- compresslevel);
+ compresslevel,
+ compressworkers);
}
else
{
@@ -2082,6 +2135,9 @@ BaseBackup(void)
if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
AppendIntegerCommandOption(&buf, use_new_option_syntax,
"COMPRESSION_LEVEL", compresslevel);
+ if (compressworkers > 1)
+ AppendIntegerCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_WORKERS", compressworkers);
}
if (verbose)
@@ -2626,7 +2682,8 @@ main(int argc, char **argv)
break;
case 'Z':
parse_compress_options(optarg, &compressmethod,
- &compressloc, &compresslevel);
+ &compressloc, &compresslevel,
+ &compressworkers);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index efbc910..86e2c8a 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -50,6 +50,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'server-zstd@4'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index c2a6161..ac5ae31 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -50,6 +50,14 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
+ },
+ {
+ 'compression_method' => 'zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:5@4'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define HAVE_LIBZSTD 1")
}
);
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a7f1675..5c1dd32 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -285,7 +285,7 @@ extern void bbsink_forward_cleanup(bbsink *sink);
extern bbsink *bbsink_copystream_new(bool send_to_client);
extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel, int compressworkers);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
--
1.8.3.1
On Mon, Mar 14, 2022 at 09:41:35PM +0530, Dipesh Pandit wrote:
I tried to implement support for parallel ZSTD compression. The
library provides an option (ZSTD_c_nbWorkers) to specify the
number of compression workers. The number of parallel
workers can be set as part of compression parameter and if this
option is specified then the library performs parallel compression
based on the specified number of workers.User can specify the number of parallel worker as part of
--compress option by appending an integer value after at sign (@).
(-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])
I suggest to use a syntax that's more general than that, maybe something like
:[level=]N,parallel=N,flag,flag,...
For example, someone may want to use zstd "long" mode or (when it's released)
rsyncable mode, or specify fine-grained compression parameters (strategy,
windowLog, hashLog, etc).
I hope the same syntax will be shared with wal_compression and pg_dump.
And libpq, if that patch progresses.
BTW, I think this may be better left for PG16.
--
Justin
On Mon, Mar 14, 2022 at 12:35 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
I suggest to use a syntax that's more general than that, maybe something like
:[level=]N,parallel=N,flag,flag,...
For example, someone may want to use zstd "long" mode or (when it's released)
rsyncable mode, or specify fine-grained compression parameters (strategy,
windowLog, hashLog, etc).
That's an interesting idea. I wonder what the replication protocol
ought to look like in that case. Should we have a COMPRESSION_DETAIL
argument that is just a string, and let the server parse it out? Or
separate protocol-level options? It does feel reasonable to have both
COMPRESSION_LEVEL and COMPRESSION_WORKERS as first-class options, but
I don't know that we want COMPRESSION_HASHLOG true as part of our
first-class grammar.
I hope the same syntax will be shared with wal_compression and pg_dump.
And libpq, if that patch progresses.BTW, I think this may be better left for PG16.
Possibly so ... but if we're thinking of any revisions to the
newly-added grammar, we had better take care of that now, before it's
set in stone.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Mar 14, 2022 at 01:02:20PM -0400, Robert Haas wrote:
On Mon, Mar 14, 2022 at 12:35 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
I suggest to use a syntax that's more general than that, maybe something like
:[level=]N,parallel=N,flag,flag,...
For example, someone may want to use zstd "long" mode or (when it's released)
rsyncable mode, or specify fine-grained compression parameters (strategy,
windowLog, hashLog, etc).That's an interesting idea. I wonder what the replication protocol
ought to look like in that case. Should we have a COMPRESSION_DETAIL
argument that is just a string, and let the server parse it out? Or
separate protocol-level options? It does feel reasonable to have both
COMPRESSION_LEVEL and COMPRESSION_WORKERS as first-class options, but
I don't know that we want COMPRESSION_HASHLOG true as part of our
first-class grammar.
I was only referring to the user-facing grammar.
Internally, I was thinking they'd all be handled as first-class options, with
separate struct fields and separate replication protocol options. If an option
isn't known, it'd be rejected on the client side, rather than causing an error
on the server.
Maybe there'd be an option parser for this in common/ (I think that might
require having new data structure there too, maybe one for each compression
method, or maybe a union{} to handles them all). Most of the ~100 lines to
support wal_compression='zstd:N' are to parse out the N.
--
Justin
On Mon, Mar 14, 2022 at 1:11 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Internally, I was thinking they'd all be handled as first-class options, with
separate struct fields and separate replication protocol options. If an option
isn't known, it'd be rejected on the client side, rather than causing an error
on the server.
There's some appeal to that, but one downside is that it means that
the client can't be used to fetch data that is compressed in a way
that the server knows about and the client doesn't. I don't think
that's great. Why should, for example, pg_basebackup need to be
compiled with zstd support in order to request zstd compression on the
server side? If the server knows about the brand new
justin-magic-sauce compression algorithm, maybe the client should just
be able to request it and, when given various .jms files by the
server, shrug its shoulders and accept them for what they are. That
doesn't work if -Fp is involved, or similar, but it should work fine
for simple cases if we set things up right.
Maybe there'd be an option parser for this in common/ (I think that might
require having new data structure there too, maybe one for each compression
method, or maybe a union{} to handles them all). Most of the ~100 lines to
support wal_compression='zstd:N' are to parse out the N.
Yes, it's actually a very simple feature now that we've got the rest
of the infrastructure set up correctly for it.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks for the patch, Dipesh.
I had a look at the patch and also tried to take the backup. I have
following suggestions and observations:
I get following error at my end:
$ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
pg_basebackup: error: could not initiate base backup: ERROR: could not
compress data: Unsupported parameter
pg_basebackup: removing data directory "/tmp/zstd_bk"
This is mostly because I have the zstd library version v1.4.4, which
does not have default support for parallel workers. Maybe we should
have a better error, something that is hinting that the parallelism is
not supported by the particular build.
The regression for pg_verifybackup test 008_untar.pl also fails with a
similar error. Here, I think we should have some logic in regression to
skip the test if the parameter is not supported?
+ if (ZSTD_isError(ret))
+ elog(ERROR,
+ "could not compress data: %s",
+ ZSTD_getErrorName(ret));
I think all of this can go on one line, but anyhow we have to improve
the error message here.
Also, just a thought, for the versions where parallelism is not
supported, should we instead just throw a warning and fall back to
non-parallel behavior?
Regards,
Jeevan Ladhe
On Mon, 14 Mar 2022 at 21:41, Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Show quoted text
Hi,
I tried to implement support for parallel ZSTD compression. The
library provides an option (ZSTD_c_nbWorkers) to specify the
number of compression workers. The number of parallel
workers can be set as part of compression parameter and if this
option is specified then the library performs parallel compression
based on the specified number of workers.User can specify the number of parallel worker as part of
--compress option by appending an integer value after at sign (@).
(-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])Please find the attached patch v1 with the above changes.
Note: ZSTD library version 1.5.x supports parallel compression
by default and if the library version is lower than 1.5.x then
parallel compression is enabled only the source is compiled with build
macro ZSTD_MULTITHREAD. If the linked library version doesn't
support parallel compression then setting the value of parameter
ZSTD_c_nbWorkers to a value other than 0 will be no-op and
returns an error.Thanks,
Dipesh
On Tue, Mar 15, 2022 at 6:33 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
I get following error at my end:
$ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
pg_basebackup: error: could not initiate base backup: ERROR: could not compress data: Unsupported parameter
pg_basebackup: removing data directory "/tmp/zstd_bk"This is mostly because I have the zstd library version v1.4.4, which
does not have default support for parallel workers. Maybe we should
have a better error, something that is hinting that the parallelism is
not supported by the particular build.
I'm not averse to trying to improve that error message, but honestly
I'd consider that to be good enough already to be acceptable. We could
think about trying to add an errhint() telling you that the problem
may be with your libzstd build.
The regression for pg_verifybackup test 008_untar.pl also fails with a
similar error. Here, I think we should have some logic in regression to
skip the test if the parameter is not supported?
Or at least to have the test not fail.
Also, just a thought, for the versions where parallelism is not
supported, should we instead just throw a warning and fall back to
non-parallel behavior?
I don't think so. I think it's better for the user to get an error and
then change their mind and request something we can do.
--
Robert Haas
EDB: http://www.enterprisedb.com
Should zstd's negative compression levels be supported here ?
Here's a POC patch which is enough to play with it.
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd |wc -c
12305659
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:1 |wc -c
13827521
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:0 |wc -c
12304018
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-1 |wc -c
16443893
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-2 |wc -c
17349563
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-4 |wc -c
19452631
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-7 |wc -c
21871505
Also, with a partial regression DB, this crashes when writing to stdout.
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=lz4 |wc -c
pg_basebackup: bbstreamer_lz4.c:172: bbstreamer_lz4_compressor_content: Assertion `mystreamer->base.bbs_buffer.maxlen >= out_bound' failed.
24117248
#4 0x000055555555e8b4 in bbstreamer_lz4_compressor_content (streamer=0x5555555a5260, member=0x7fffffffc760,
data=0x7ffff3068010 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n\"Files\": [\n{ \"Path\": \"backup_label\", \"Size\": 227, \"Last-Modified\": \"2022-03-16 02:29:11 GMT\", \"Checksum-Algorithm\": \"CRC32C\", \"Checksum\": \"46f69d99\" },\n{ \"Pa"..., len=401072, context=BBSTREAMER_MEMBER_CONTENTS) at bbstreamer_lz4.c:172
mystreamer = 0x5555555a5260
next_in = 0x7ffff3068010 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n\"Files\": [\n{ \"Path\": \"backup_label\", \"Size\": 227, \"Last-Modified\": \"2022-03-16 02:29:11 GMT\", \"Checksum-Algorithm\": \"CRC32C\", \"Checksum\": \"46f69d99\" },\n{ \"Pa"...
...
(gdb) p mystreamer->base.bbs_buffer.maxlen
$1 = 524288
(gdb) p (int) LZ4F_compressBound(len, &mystreamer->prefs)
$4 = 524300
This is with: liblz4-1:amd64 1.9.2-2ubuntu0.20.04.1
Attachments:
0001-pg_basebackup-support-Zstd-negative-compression-leve.patchtext/x-diff; charset=us-asciiDownload
From 9a330a3a1801352cef3b5912e31aba61760dac32 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH] pg_basebackup: support Zstd negative compression levels
"higher than maximum" is bogus
TODO: each compression methods should enforce its own levels
---
src/backend/replication/basebackup_zstd.c | 8 ++++++--
src/backend/replication/repl_scanner.l | 4 +++-
src/bin/pg_basebackup/bbstreamer_zstd.c | 7 ++++++-
src/bin/pg_basebackup/pg_basebackup.c | 23 ++++++++++++-----------
4 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27b..4464fcb30e1 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -71,7 +71,7 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 22)
+ if (compresslevel < -7 || compresslevel > 22)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("zstd compression level %d is out of range",
@@ -96,13 +96,17 @@ bbsink_zstd_begin_backup(bbsink *sink)
{
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
+ size_t ret;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
elog(ERROR, "could not create zstd compression context");
- ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
mysink->compresslevel);
+ if (ZSTD_isError(ret))
+ elog(ERROR, "could not create zstd compression context: %s",
+ ZSTD_getErrorName(ret));
/*
* We need our own buffer, because we're going to pass different data to
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 4b64c0d768b..05c4ef463a1 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -86,6 +86,7 @@ xdinside [^"]+
digit [0-9]
hexdigit [0-9A-Fa-f]
+sign ("-"|"+")
ident_start [A-Za-z\200-\377_]
ident_cont [A-Za-z\200-\377_0-9\$]
@@ -127,9 +128,10 @@ NOEXPORT_SNAPSHOT { return K_NOEXPORT_SNAPSHOT; }
USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
+
{space}+ { /* do nothing */ }
-{digit}+ {
+{sign}?{digit}+ {
yylval.uintval = strtoul(yytext, NULL, 10);
return UCONST;
}
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb7..337e789b6a1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,6 +67,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
+ size_t ret;
Assert(next != NULL);
@@ -84,9 +85,13 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
pg_log_error("could not create zstd compression context");
/* Initialize stream compression preferences */
- ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
compresslevel);
+ if (ZSTD_isError(ret))
+ pg_log_error("could not create zstd compression context: %s",
+ ZSTD_getErrorName(ret));
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2943d9ec1a0..2db600a34be 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1129,7 +1129,7 @@ parse_compress_options(char *src, WalCompressionMethod *methodres,
* For any of the methods currently supported, the data after the
* separator can just be an integer.
*/
- if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
+ if (!option_parse_int(sep, "-Z/--compress", -7, INT_MAX,
levelres))
exit(1);
@@ -2079,7 +2079,7 @@ BaseBackup(void)
}
AppendStringCommandOption(&buf, use_new_option_syntax,
"COMPRESSION", compressmethodstr);
- if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
+ if (compresslevel != 0) /* not 0 or Z_DEFAULT_COMPRESSION */
AppendIntegerCommandOption(&buf, use_new_option_syntax,
"COMPRESSION_LEVEL", compresslevel);
}
@@ -2896,10 +2896,10 @@ main(int argc, char **argv)
}
break;
case COMPRESSION_GZIP:
- if (compresslevel > 9)
+ if (compresslevel > 9 || compresslevel < -1) /* Z_DEFAULT_COMPRESSION */
{
- pg_log_error("compression level %d of method %s higher than maximum of 9",
- compresslevel, "gzip");
+ pg_log_error("compression level %d of method %s out of range (%s)",
+ compresslevel, "gzip", "1..9");
exit(1);
}
if (compressloc == COMPRESS_LOCATION_CLIENT)
@@ -2915,18 +2915,19 @@ main(int argc, char **argv)
}
break;
case COMPRESSION_LZ4:
- if (compresslevel > 12)
+ if (compresslevel > 12 || compresslevel < 0)
{
- pg_log_error("compression level %d of method %s higher than maximum of 12",
- compresslevel, "lz4");
+ pg_log_error("compression level %d of method %s out of range (%s)",
+ compresslevel, "lz4", "1..12");
exit(1);
}
break;
case COMPRESSION_ZSTD:
- if (compresslevel > 22)
+ break; // XXX
+ if (compresslevel > 22 || compresslevel < -7)
{
- pg_log_error("compression level %d of method %s higher than maximum of 22",
- compresslevel, "zstd");
+ pg_log_error("compression level %d of method %s out of range (%s)",
+ compresslevel, "zstd", "-7..22");
exit(1);
}
break;
--
2.17.1
On Mon, Mar 14, 2022 at 1:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
There's some appeal to that, but one downside is that it means that
the client can't be used to fetch data that is compressed in a way
that the server knows about and the client doesn't. I don't think
that's great. Why should, for example, pg_basebackup need to be
compiled with zstd support in order to request zstd compression on the
server side? If the server knows about the brand new
justin-magic-sauce compression algorithm, maybe the client should just
be able to request it and, when given various .jms files by the
server, shrug its shoulders and accept them for what they are. That
doesn't work if -Fp is involved, or similar, but it should work fine
for simple cases if we set things up right.
Concretely, I propose the attached patch for v15. It renames the
newly-added COMPRESSION_LEVEL option to COMPRESSION_DETAIL, introduces
a flexible syntax for options along the lines you proposed, and
adjusts things so that a client that doesn't support a particular type
of compression can still request that type of compression from the
server.
I think it's important to do this for v15 so that we don't end up with
backward-compatibility problems down the road.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v1-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchapplication/octet-stream; name=v1-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchDownload
From dc3e4fb520b6bfe2cdee66a4fa6133b8fae76b1f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 17 Mar 2022 11:32:04 -0400
Subject: [PATCH v1] Replace BASE_BACKUP COMPRESSION_LEVEL option with
COMPRESSION_DETAIL.
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
---
doc/src/sgml/protocol.sgml | 18 +-
src/backend/replication/basebackup.c | 60 +--
src/backend/replication/basebackup_gzip.c | 20 +-
src/backend/replication/basebackup_lz4.c | 19 +-
src/backend/replication/basebackup_zstd.c | 19 +-
src/bin/pg_basebackup/bbstreamer.h | 7 +-
src/bin/pg_basebackup/bbstreamer_gzip.c | 7 +-
src/bin/pg_basebackup/bbstreamer_lz4.c | 4 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 4 +-
src/bin/pg_basebackup/pg_basebackup.c | 405 ++++++++-----------
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 33 +-
src/common/Makefile | 1 +
src/common/backup_compression.c | 256 ++++++++++++
src/include/common/backup_compression.h | 44 ++
src/include/replication/basebackup_sink.h | 7 +-
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/pgindent/typedefs.list | 2 +
17 files changed, 594 insertions(+), 314 deletions(-)
create mode 100644 src/common/backup_compression.c
create mode 100644 src/include/common/backup_compression.h
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term>
+ <term><literal>COMPRESSION_DETAIL</literal> <replaceable>detail</replaceable></term>
<listitem>
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- For <literal>gzip</literal> the value should be an integer between 1
- and 9, for <literal>lz4</literal> between 1 and 12, and for
- <literal>zstd</literal> it should be between 1 and 22.
+ If the value is an integer, it specifies the compression level.
+ Otherwise, it should be a comma-separated list of items, each of
+ the form <literal>keyword</literal> or
+ <literal>keyword=value</literal>. Currently, the only supported
+ keyword is <literal>level</literal>, which sets the compression
+ level.
+ </para>
+
+ <para>
+ For <literal>gzip</literal> the compression level should be an
+ integer between 1 and 9, for <literal>lz4</literal> an integer
+ between 1 and 12, and for <literal>zstd</literal> an integer
+ between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c2aedc14a2..afaf4f9ce1 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -18,6 +18,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "common/file_perm.h"
+#include "common/backup_compression.h"
#include "commands/defrem.h"
#include "lib/stringinfo.h"
#include "miscadmin.h"
@@ -54,14 +55,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4,
- BACKUP_COMPRESSION_ZSTD
-} basebackup_compression_type;
-
typedef struct
{
const char *label;
@@ -75,8 +68,8 @@ typedef struct
bool use_copytblspc;
BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
- basebackup_compression_type compression;
- int compression_level;
+ bc_algorithm compression;
+ bc_specification compression_specification;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -713,12 +706,14 @@ parse_basebackup_options(List *options, basebackup_options *opt)
char *target_str = NULL;
char *target_detail_str = NULL;
bool o_compression = false;
- bool o_compression_level = false;
+ bool o_compression_detail = false;
+ char *compression_detail_str = NULL;
MemSet(opt, 0, sizeof(*opt));
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
+ opt->compression_specification.algorithm = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -885,29 +880,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "none") == 0)
- opt->compression = BACKUP_COMPRESSION_NONE;
- else if (strcmp(optval, "gzip") == 0)
- opt->compression = BACKUP_COMPRESSION_GZIP;
- else if (strcmp(optval, "lz4") == 0)
- opt->compression = BACKUP_COMPRESSION_LZ4;
- else if (strcmp(optval, "zstd") == 0)
- opt->compression = BACKUP_COMPRESSION_ZSTD;
- else
+ if (!parse_bc_algorithm(optval, &opt->compression))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized compression algorithm: \"%s\"",
+ errmsg("unrecognized compression algorithm \"%s\"",
optval)));
o_compression = true;
}
- else if (strcmp(defel->defname, "compression_level") == 0)
+ else if (strcmp(defel->defname, "compression_detail") == 0)
{
- if (o_compression_level)
+ if (o_compression_detail)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->compression_level = defGetInt32(defel);
- o_compression_level = true;
+ compression_detail_str = defGetString(defel);
+ o_compression_detail = true;
}
else
ereport(ERROR,
@@ -949,10 +936,25 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_handle =
BaseBackupGetTargetHandle(target_str, target_detail_str);
- if (o_compression_level && !o_compression)
+ if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("compression level requires compression")));
+
+ if (o_compression)
+ {
+ char *error_detail;
+
+ parse_bc_specification(opt->compression, compression_detail_str,
+ &opt->compression_specification);
+ error_detail =
+ validate_bc_specification(&opt->compression_specification);
+ if (error_detail != NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid compression specification: %s",
+ error_detail));
+ }
}
@@ -998,11 +1000,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
- sink = bbsink_gzip_new(sink, opt.compression_level);
+ sink = bbsink_gzip_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
- sink = bbsink_lz4_new(sink, opt.compression_level);
+ sink = bbsink_lz4_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
- sink = bbsink_zstd_new(sink, opt.compression_level);
+ sink = bbsink_zstd_new(sink, &opt.compression_specification);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index b66d3da7a3..703a91ba77 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_gzip_ops = {
#endif
/*
- * Create a new basebackup sink that performs gzip compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs gzip compression.
*/
bbsink *
-bbsink_gzip_new(bbsink *next, int compresslevel)
+bbsink_gzip_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef HAVE_LIBZ
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,15 +72,14 @@ bbsink_gzip_new(bbsink *next, int compresslevel)
bbsink_gzip *sink;
Assert(next != NULL);
- Assert(compresslevel >= 0 && compresslevel <= 9);
- if (compresslevel == 0)
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
compresslevel = Z_DEFAULT_COMPRESSION;
- else if (compresslevel < 0 || compresslevel > 9)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("gzip compression level %d is out of range",
- compresslevel)));
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 9);
+ }
sink = palloc0(sizeof(bbsink_gzip));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index d838f723d0..06c161ddc4 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_lz4_ops = {
#endif
/*
- * Create a new basebackup sink that performs lz4 compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs lz4 compression.
*/
bbsink *
-bbsink_lz4_new(bbsink *next, int compresslevel)
+bbsink_lz4_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_LZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -72,11 +73,13 @@ bbsink_lz4_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 12)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("lz4 compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 12);
+ }
sink = palloc0(sizeof(bbsink_lz4));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27..96b7985693 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -55,12 +55,13 @@ const bbsink_ops bbsink_zstd_ops = {
#endif
/*
- * Create a new basebackup sink that performs zstd compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs zstd compression.
*/
bbsink *
-bbsink_zstd_new(bbsink *next, int compresslevel)
+bbsink_zstd_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_ZSTD
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,11 +72,13 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 22)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("zstd compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 22);
+ }
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index 02d4c05df6..dfa3f77af4 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -22,6 +22,7 @@
#ifndef BBSTREAMER_H
#define BBSTREAMER_H
+#include "common/backup_compression.h"
#include "lib/stringinfo.h"
#include "pqexpbuffer.h"
@@ -200,17 +201,17 @@ bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
*/
extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 894f857103..1979e95639 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -76,7 +76,8 @@ const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
* closed so that the data may be written there.
*/
bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ bc_specification *compress)
{
#ifdef HAVE_LIBZ
bbstreamer_gzip_writer *streamer;
@@ -115,11 +116,11 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
}
}
- if (gzsetparams(streamer->gzfile, compresslevel,
+ if (gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
+ compress->level, get_gz_error(streamer->gzfile));
exit(1);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index 810052e4e3..a6ec317e2b 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -67,7 +67,7 @@ const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
@@ -89,7 +89,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compresslevel;
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb..caa5edcaf1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -63,7 +63,7 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
@@ -85,7 +85,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
/* Initialize stream compression preferences */
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
+ compress->level);
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2943d9ec1a..88635ad1b0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
#include "access/xlog_internal.h"
#include "bbstreamer.h"
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -57,6 +58,7 @@ typedef struct TablespaceList
typedef struct ArchiveStreamState
{
int tablespacenum;
+ bc_specification *compress;
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
@@ -132,9 +134,6 @@ static bool checksum_failure = false;
static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
-static int compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
-static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -198,7 +197,8 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile);
+ bool expect_unterminated_tarfile,
+ bc_specification *compress);
static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
void *callback_data);
static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
@@ -207,7 +207,7 @@ static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum);
+ bool tablespacenum, bc_specification *compress);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
@@ -215,7 +215,9 @@ static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
static void ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf);
static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
void *callback_data);
-static void BaseBackup(void);
+static void BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc,
+ bc_specification *client_compress);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
bool segment_finished);
@@ -542,7 +544,9 @@ typedef struct
} logstreamer_param;
static int
-LogStreamerMain(logstreamer_param *param)
+LogStreamerMain(logstreamer_param *param,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
StreamCtl stream;
@@ -565,25 +569,14 @@ LogStreamerMain(logstreamer_param *param)
stream.mark_done = true;
stream.partial_suffix = NULL;
stream.replication_slot = replication_slot;
-
if (format == 'p')
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
COMPRESSION_NONE, 0,
stream.do_sync);
- else if (compressloc != COMPRESS_LOCATION_CLIENT)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
- stream.do_sync);
- else if (compressmethod == COMPRESSION_GZIP)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- compressmethod,
- compresslevel,
- stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
+ wal_compress_method,
+ wal_compress_level,
stream.do_sync);
if (!ReceiveXlogStream(param->bgconn, &stream))
@@ -629,7 +622,9 @@ LogStreamerMain(logstreamer_param *param)
* stream the logfile in parallel with the backups.
*/
static void
-StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
+StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
logstreamer_param *param;
uint32 hi,
@@ -729,7 +724,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
int ret;
/* in child process */
- ret = LogStreamerMain(param);
+ ret = LogStreamerMain(param, wal_compress_method, wal_compress_level);
/* temp debugging aid to analyze 019_replslot_limit failures */
if (verbose)
@@ -1004,136 +999,81 @@ parse_max_rate(char *src)
}
/*
- * Utility wrapper to parse the values specified for -Z/--compress.
- * *methodres and *levelres will be optionally filled with values coming
- * from the parsed results.
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
+ *
+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwhich, the requested algorithm is "turkey"
+ * and the detail string is "sandwhich". We'll sort out whether that's legal
+ * at a later stage.
*/
static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
- CompressionLocation *locationres, int *levelres)
+parse_compress_options(char *option, char **algorithm, char **detail,
+ CompressionLocation *locationres)
{
char *sep;
- int firstlen;
- char *firstpart;
+ char *endp;
/*
- * clear 'levelres' so that if there are multiple compression options,
- * the last one fully overrides the earlier ones
- */
- *levelres = 0;
-
- /* check if the option is split in two */
- sep = strchr(src, ':');
-
- /*
- * The first part of the option value could be a method name, or just a
- * level value.
- */
- firstlen = (sep != NULL) ? (sep - src) : strlen(src);
- firstpart = pg_malloc(firstlen + 1);
- memcpy(firstpart, src, firstlen);
- firstpart[firstlen] = '\0';
-
- /*
- * Check if the first part of the string matches with a supported
- * compression method.
+ * Check whether the compression specification consists of a bare integer.
+ *
+ * If so, for backward compatibility, assume gzip.
*/
- if (pg_strcasecmp(firstpart, "gzip") == 0)
+ (void) strtol(option, &endp, 10);
+ if (*endp == '\0')
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ *algorithm = pstrdup("gzip");
+ *detail = pstrdup(option);
+ return;
}
- else if (pg_strcasecmp(firstpart, "client-gzip") == 0)
- {
- *methodres = COMPRESSION_GZIP;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-gzip") == 0)
+
+ /* Strip off any "client-" or "server-" prefix. */
+ if (strncmp(option, "server-", 7) == 0)
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
+ option += 7;
}
- else if (pg_strcasecmp(firstpart, "lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ else if (strncmp(option, "client-", 7) == 0)
{
- *methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "none") == 0)
- {
- *methodres = COMPRESSION_NONE;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ option += 7;
}
else
- {
- /*
- * It does not match anything known, so check for the
- * backward-compatible case of only an integer where the implied
- * compression method changes depending on the level value.
- */
- if (!option_parse_int(firstpart, "-Z/--compress", 0,
- INT_MAX, levelres))
- exit(1);
-
- *methodres = (*levelres > 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
- free(firstpart);
- return;
- }
-
+ /*
+ * Check whether there is a compression detail following the algorithm
+ * name.
+ */
+ sep = strchr(option, ':');
if (sep == NULL)
{
- /*
- * The caller specified a method without a colon separator, so let any
- * subsequent checks assign a default level.
- */
- free(firstpart);
- return;
+ *algorithm = pstrdup(option);
+ *detail = NULL;
}
-
- /* Check the contents after the colon separator. */
- sep++;
- if (*sep == '\0')
+ else
{
- pg_log_error("no compression level defined for method %s", firstpart);
- exit(1);
- }
+ char *alg;
- /*
- * For any of the methods currently supported, the data after the
- * separator can just be an integer.
- */
- if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
- levelres))
- exit(1);
+ alg = palloc((sep - option) + 1);
+ memcpy(alg, option, sep - option);
+ alg[sep - option] = '\0';
- free(firstpart);
+ *algorithm = alg;
+ *detail = pstrdup(sep + 1);
+ }
}
/*
@@ -1200,7 +1140,8 @@ static bbstreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile)
+ bool expect_unterminated_tarfile,
+ bc_specification *compress)
{
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
@@ -1316,32 +1257,28 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
- if (compressmethod == COMPRESSION_NONE ||
- compressloc != COMPRESS_LOCATION_CLIENT)
+ if (compress->algorithm == BACKUP_COMPRESSION_NONE)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- else if (compressmethod == COMPRESSION_GZIP)
+ else if (compress->algorithm == BACKUP_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
- archive_file,
- compresslevel);
+ archive_file, compress);
}
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (compress->algorithm == BACKUP_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_lz4_compressor_new(streamer, compress);
}
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (compress->algorithm == BACKUP_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1395,13 +1332,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ if (format == 'p')
{
- if (compressmethod == COMPRESSION_GZIP)
+ if (is_tar_gz)
streamer = bbstreamer_gzip_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (is_tar_lz4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (is_tar_zstd)
streamer = bbstreamer_zstd_decompressor_new(streamer);
}
@@ -1415,13 +1352,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* manifest if present - as a single COPY stream.
*/
static void
-ReceiveArchiveStream(PGconn *conn)
+ReceiveArchiveStream(PGconn *conn, bc_specification *compress)
{
ArchiveStreamState state;
/* Set up initial state. */
memset(&state, 0, sizeof(state));
state.tablespacenum = -1;
+ state.compress = compress;
/* All the real work happens in ReceiveArchiveStreamChunk. */
ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
@@ -1542,7 +1480,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
CreateBackupStreamer(archive_name,
spclocation,
&state->manifest_inject_streamer,
- true, false);
+ true, false,
+ state->compress);
}
break;
}
@@ -1743,7 +1682,7 @@ ReportCopyDataParseError(size_t r, char *copybuf)
*/
static void
ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum)
+ bool tablespacenum, bc_specification *compress)
{
WriteTarState state;
bbstreamer *manifest_inject_streamer;
@@ -1759,7 +1698,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
state.streamer = CreateBackupStreamer(archive_name, spclocation,
&manifest_inject_streamer,
is_recovery_guc_supported,
- expect_unterminated_tarfile);
+ expect_unterminated_tarfile,
+ compress);
state.tablespacenum = tablespacenum;
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
progress_update_filename(NULL);
@@ -1902,7 +1842,8 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
}
static void
-BaseBackup(void)
+BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc, bc_specification *client_compress)
{
PGresult *res;
char *sysidentifier;
@@ -2055,33 +1996,17 @@ BaseBackup(void)
if (compressloc == COMPRESS_LOCATION_SERVER)
{
- char *compressmethodstr = NULL;
-
if (!use_new_option_syntax)
{
pg_log_error("server does not support server-side compression");
exit(1);
}
- switch (compressmethod)
- {
- case COMPRESSION_GZIP:
- compressmethodstr = "gzip";
- break;
- case COMPRESSION_LZ4:
- compressmethodstr = "lz4";
- break;
- case COMPRESSION_ZSTD:
- compressmethodstr = "zstd";
- break;
- default:
- Assert(false);
- break;
- }
AppendStringCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION", compressmethodstr);
- if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
- AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION_LEVEL", compresslevel);
+ "COMPRESSION", compression_algorithm);
+ if (compression_detail != NULL)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_DETAIL",
+ compression_detail);
}
if (verbose)
@@ -2207,15 +2132,33 @@ BaseBackup(void)
*/
if (includewal == STREAM_WAL)
{
+ WalCompressionMethod wal_compress_method;
+ int wal_compress_level;
+
if (verbose)
pg_log_info("starting background WAL receiver");
- StartLogStreamer(xlogstart, starttli, sysidentifier);
+
+ if (client_compress->algorithm == BACKUP_COMPRESSION_GZIP)
+ {
+ wal_compress_method = COMPRESSION_GZIP;
+ wal_compress_level =
+ (client_compress->options & BACKUP_COMPRESSION_OPTION_LEVEL)
+ != 0 ? client_compress->level : 0;
+ }
+ else
+ {
+ wal_compress_method = COMPRESSION_NONE;
+ wal_compress_level = 0;
+ }
+
+ StartLogStreamer(xlogstart, starttli, sysidentifier,
+ wal_compress_method, wal_compress_level);
}
if (serverMajor >= 1500)
{
/* Receive a single tar stream with everything. */
- ReceiveArchiveStream(conn);
+ ReceiveArchiveStream(conn, client_compress);
}
else
{
@@ -2244,7 +2187,8 @@ BaseBackup(void)
spclocation = PQgetvalue(res, i, 1);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ ReceiveTarFile(conn, archive_name, spclocation, i,
+ client_compress);
}
/*
@@ -2511,6 +2455,10 @@ main(int argc, char **argv)
int c;
int option_index;
+ char *compression_algorithm = "none";
+ char *compression_detail = NULL;
+ CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
+ bc_specification client_compress;
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2616,17 +2564,13 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
- compressmethod = COMPRESSION_GZIP;
+ compression_algorithm = "gzip";
+ compression_detail = NULL;
compressloc = COMPRESS_LOCATION_UNSPECIFIED;
break;
case 'Z':
- parse_compress_options(optarg, &compressmethod,
- &compressloc, &compresslevel);
+ parse_compress_options(optarg, &compression_algorithm,
+ &compression_detail, &compressloc);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2753,12 +2697,11 @@ main(int argc, char **argv)
}
/*
- * If we're compressing the backup and the user has not said where to
- * perform the compression, do it on the client, unless they specified
- * --target, in which case the server is the only choice.
+ * If the user has not specified where to perform backup compression,
+ * default to the client, unless the user specified --target, in which case
+ * the server is the only choice.
*/
- if (compressmethod != COMPRESSION_NONE &&
- compressloc == COMPRESS_LOCATION_UNSPECIFIED)
+ if (compressloc == COMPRESS_LOCATION_UNSPECIFIED)
{
if (backup_target == NULL)
compressloc = COMPRESS_LOCATION_CLIENT;
@@ -2766,6 +2709,40 @@ main(int argc, char **argv)
compressloc = COMPRESS_LOCATION_SERVER;
}
+ /*
+ * If any compression that we're doing is happening on the client side,
+ * we must try to parse the compression algorithm and detail, but if it's
+ * all on the server side, then we're just going to pass through whatever
+ * was requested and let the server decide what to do.
+ */
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ bc_algorithm alg;
+ char *error_detail;
+
+ if (!parse_bc_algorithm(compression_algorithm, &alg))
+ {
+ pg_log_error("unrecognized compression algorithm \"%s\"",
+ compression_algorithm);
+ exit(1);
+ }
+
+ parse_bc_specification(alg, compression_detail, &client_compress);
+ error_detail = validate_bc_specification(&client_compress);
+ if (error_detail != NULL)
+ {
+ pg_log_error("invalid compression specification: %s",
+ error_detail);
+ exit(1);
+ }
+ }
+ else
+ {
+ Assert(compressloc = COMPRESS_LOCATION_SERVER);
+ client_compress.algorithm = BACKUP_COMPRESSION_NONE;
+ client_compress.options = 0;
+ }
+
/*
* Can't perform client-side compression if the backup is not being
* sent to the client.
@@ -2779,9 +2756,10 @@ main(int argc, char **argv)
}
/*
- * Compression doesn't make sense unless tar format is in use.
+ * Client-side compression doesn't make sense unless tar format is in use.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT)
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT &&
+ client_compress.algorithm != BACKUP_COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2882,56 +2860,6 @@ main(int argc, char **argv)
}
}
- /* Sanity checks for compression-related options. */
- switch (compressmethod)
- {
- case COMPRESSION_NONE:
- if (compresslevel != 0)
- {
- pg_log_error("cannot use compression level with method %s",
- "none");
- fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
- progname);
- exit(1);
- }
- break;
- case COMPRESSION_GZIP:
- if (compresslevel > 9)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 9",
- compresslevel, "gzip");
- exit(1);
- }
- if (compressloc == COMPRESS_LOCATION_CLIENT)
- {
-#ifdef HAVE_LIBZ
- if (compresslevel == 0)
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- pg_log_error("this build does not support compression with %s",
- "gzip");
- exit(1);
-#endif
- }
- break;
- case COMPRESSION_LZ4:
- if (compresslevel > 12)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 12",
- compresslevel, "lz4");
- exit(1);
- }
- break;
- case COMPRESSION_ZSTD:
- if (compresslevel > 22)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 22",
- compresslevel, "zstd");
- exit(1);
- }
- break;
- }
-
/*
* Sanity checks for progress reporting options.
*/
@@ -3040,7 +2968,8 @@ main(int argc, char **argv)
free(linkloc);
}
- BaseBackup();
+ BaseBackup(compression_algorithm, compression_detail, compressloc,
+ &client_compress);
success = true;
return 0;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index efefe947d9..a05abdadbe 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -42,16 +42,24 @@ $node->command_fails(['pg_basebackup'],
# Sanity checks for options
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:1' ],
- qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure if method "none" specified with compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none+' ],
- qr/\Qpg_basebackup: error: invalid value "none+" for option/,
+ qr/\Qunrecognized compression algorithm "none+"/,
'failure on incorrect separator to define compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:' ],
- qr/\Qpg_basebackup: error: no compression level defined for method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure on missing compression level value');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'gzip:thunk' ],
+ qr/\Qunknown compression option "thunk"/,
+ 'failure on missing compression level value');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'gzip:level=high' ],
+ qr/\Qvalue for compression option "level" must be an integer/,
+ 'failure on non-numeric compression level');
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
@@ -89,6 +97,25 @@ print $conf "wal_level = replica\n";
close $conf;
$node->restart;
+# Now that we have a server that supports replication commands, test whether
+# certain scenarios fail on the client side or on the server side.
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'gzip:level=236' ],
+ qr/\Qpg_basebackup: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9/,
+ 'client failure on out-of-range compression level');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'server-gzip:level=236' ],
+ qr/\Qpg_basebackup: error: could not initiate base backup: ERROR: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9/,
+ 'server failure on out-of-range compression level');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'client-extrasquishy' ],
+ qr/\Qpg_basebackup: error: unrecognized compression algorithm "extrasquishy"/,
+ 'client failure on invalid compression algorithm');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'server-extrasquishy' ],
+ qr/\Qpg_basebackup: error: could not initiate base backup: ERROR: unrecognized compression algorithm "extrasquishy"/,
+ 'server failure on invalid compression algorithm');
+
# Write some files to test that they are not copied.
foreach my $filename (
qw(backup_label tablespace_map postgresql.auto.conf.tmp
diff --git a/src/common/Makefile b/src/common/Makefile
index 31c0dd366d..f627349835 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -47,6 +47,7 @@ LIBS += $(PTHREAD_LIBS)
OBJS_COMMON = \
archive.o \
+ backup_compression.o \
base64.o \
checksum_helper.o \
config_info.o \
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
new file mode 100644
index 0000000000..f943345421
--- /dev/null
+++ b/src/common/backup_compression.c
@@ -0,0 +1,256 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.c
+ *
+ * Shared code for backup compression methods and specifications.
+ *
+ * A compression specification specifies the parameters that should be used
+ * when * performing compression with a specific algorithm. The simplest
+ * possible compression specification is an integer, which sets the
+ * compression level.
+ *
+ * Otherwise, a compression specification is a comma-separated list of items,
+ * each having the form keyword or keyword=value.
+ *
+ * Currently, the only supported keyword is "level".
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.c
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/backup_compression.h"
+
+static int expect_integer_value(char *keyword, char *value,
+ bc_specification *result);
+
+/*
+ * Look up a compression algorithm by name. Returns true and sets *algorithm
+ * if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_bc_algorithm(char *name, bc_algorithm *algorithm)
+{
+ if (strcmp(name, "none") == 0)
+ *algorithm = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(name, "gzip") == 0)
+ *algorithm = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(name, "lz4") == 0)
+ *algorithm = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(name, "zstd") == 0)
+ *algorithm = BACKUP_COMPRESSION_ZSTD;
+ else
+ return false;
+ return true;
+}
+
+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+char *
+get_bc_algorithm_name(bc_algorithm algorithm)
+{
+ switch (algorithm)
+ {
+ case BACKUP_COMPRESSION_NONE:
+ return "none";
+ case BACKUP_COMPRESSION_GZIP:
+ return "gzip";
+ case BACKUP_COMPRESSION_LZ4:
+ return "lz4";
+ case BACKUP_COMPRESSION_ZSTD:
+ return "zstd";
+ /* no default, to provoke compiler warnings if values are added */
+ }
+ Assert(false);
+}
+
+/*
+ * Parse a compression specification for a specified algorithm.
+ *
+ * See the file header comments for a brief description of what a compression
+ * specification is expected to look like.
+ *
+ * On return, all fields of the result object will be initialized.
+ * In particular, result->parse_error will contain an appropriate error message
+ * if errors were found during parsing, and will be NULL otherwise. However,
+ * even if there's no parse error, the string might not make sense: e.g.
+ * for gzip, level=12 is not sensible, but it does parse OK.
+ *
+ * Use validate_bc_specification() to find out whether a compression
+ * specification is semantically sensible.
+ */
+void
+parse_bc_specification(bc_algorithm algorithm, char *specification,
+ bc_specification *result)
+{
+ int bare_level;
+ char *bare_level_endp;
+
+ /* Initial setup of result object. */
+ result->algorithm = algorithm;
+ result->options = 0;
+ result->level = -1;
+ result->parse_error = NULL;
+
+ /* If there is no specification, we're done already. */
+ if (specification == NULL)
+ return;
+
+ /* As a special case, the specification can be a bare integer. */
+ bare_level = strtol(specification, &bare_level_endp, 10);
+ if (*bare_level_endp == '\0')
+ {
+ result->level = bare_level;
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ return;
+ }
+
+ /* Look for comma-separated keyword or keyword=value entries. */
+ while (1)
+ {
+ char *kwstart;
+ char *kwend;
+ char *vstart;
+ char *vend;
+ int kwlen;
+ int vlen;
+ char *keyword;
+ char *value;
+
+ /* Figure start, end, and length of next keyword and any value. */
+ kwstart = kwend = specification;
+ while (*kwend != '\0' && *kwend != ',' && *kwend != '=')
+ ++kwend;
+ kwlen = kwend - kwstart;
+ if (*kwend != '=')
+ {
+ vstart = vend = NULL;
+ vlen = 0;
+ }
+ else
+ {
+ vstart = vend = kwend + 1;
+ while (*vend != '\0' && *vend != ',')
+ ++vend;
+ vlen = vend - vstart;
+ }
+
+ /* Reject empty keyword. */
+ if (kwlen == 0)
+ {
+ result->parse_error =
+ pstrdup("found empty string where a compression option was expected");
+ break;
+ }
+
+ /* Extract keyword and value as separate C strings. */
+ keyword = palloc(kwlen + 1);
+ memcpy(keyword, kwstart, kwlen);
+ keyword[kwlen] = '\0';
+ if (vlen == 0)
+ value = NULL;
+ else
+ {
+ value = palloc(vlen + 1);
+ memcpy(value, vstart, vlen);
+ value[vlen] = '\0';
+ }
+
+ /* Handle whatever keyword we found. */
+ if (strcmp(keyword, "level") == 0)
+ {
+ result->level = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ }
+ else
+ result->parse_error =
+ psprintf("unknown compression option \"%s\"", keyword);
+
+ /* Release memory, just to be tidy. */
+ pfree(keyword);
+ pfree(value);
+
+ /* If we got an error or have reached the end of the string, stop. */
+ if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ break;
+
+ /* Advance to next entry and loop around. */
+ specification = vend == NULL ? kwend + 1 : vend + 1;
+ }
+}
+
+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)
+{
+ int ivalue;
+ char *ivalue_endp;
+
+ ivalue = strtol(value, &ivalue_endp, 10);
+ if (*ivalue_endp != '\0')
+ {
+ result->parse_error =
+ psprintf("value for compression option \"%s\" must be an integer",
+ keyword);
+ return -1;
+ }
+ return ivalue;
+}
+
+/*
+ * Returns NULL if the compression specification string was syntactically
+ * valid and semantically sensible. Otherwise, returns an error message.
+ *
+ * Does not test whether this build of PostgreSQL supports the requested
+ * compression method.
+ */
+char *
+validate_bc_specification(bc_specification *spec)
+{
+ /* If it didn't even parse OK, it's definitely no good. */
+ if (spec->parse_error != NULL)
+ return spec->parse_error;
+
+ /*
+ * If a compression level was specified, check that the algorithm expects
+ * a compression level and that the level is within the legal range for
+ * the algorithm.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ int min_level = 1;
+ int max_level;
+
+ if (spec->algorithm == BACKUP_COMPRESSION_GZIP)
+ max_level = 9;
+ else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
+ max_level = 12;
+ else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ max_level = 22;
+ else
+ return psprintf("compression algorithm \"%s\" does not accept a compression level",
+ get_bc_algorithm_name(spec->algorithm));
+
+ if (spec->level < min_level || spec->level > max_level)
+ return psprintf("compression algorithm \"%s\" expects a compression level between %d and %d",
+ get_bc_algorithm_name(spec->algorithm),
+ min_level, max_level);
+ }
+
+ return NULL;
+}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
new file mode 100644
index 0000000000..98c1e75a80
--- /dev/null
+++ b/src/include/common/backup_compression.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.h
+ *
+ * Shared definitions for backup compression methods and specifications.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BACKUP_COMPRESSION_H
+#define BACKUP_COMPRESSION_H
+
+typedef enum bc_algorithm
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
+} bc_algorithm;
+
+#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+
+typedef struct bc_specification
+{
+ bc_algorithm algorithm;
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
+ int level;
+ char *parse_error; /* NULL if parsing was OK, else message */
+} bc_specification;
+
+extern bool parse_bc_algorithm(char *name, bc_algorithm *algorithm);
+extern char *get_bc_algorithm_name(bc_algorithm algorithm);
+
+extern void parse_bc_specification(bc_algorithm algorithm,
+ char *specification,
+ bc_specification *result);
+
+extern char *validate_bc_specification(bc_specification *);
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a7f16758a4..654df28576 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -27,6 +27,7 @@
#define BASEBACKUP_SINK_H
#include "access/xlog_internal.h"
+#include "common/backup_compression.h"
#include "nodes/pg_list.h"
/* Forward declarations. */
@@ -283,9 +284,9 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
-extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_gzip_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_lz4_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_zstd_new(bbsink *next, bc_specification *);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 441d6ae6bf..de8676d339 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -124,7 +124,7 @@ sub mkvcbuild
}
our @pgcommonallfiles = qw(
- archive.c base64.c checksum_helper.c
+ archive.c backup_compression.c base64.c checksum_helper.c
config_info.c controldata_utils.c d2s.c encnames.c exec.c
f2s.c file_perm.c file_utils.c hashfn.c ip.c jsonapi.c
keywords.c kwlookup.c link-canary.c md5_common.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaf3e7a8d4..01748cac07 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3788,3 +3788,5 @@ yyscan_t
z_stream
z_streamp
zic_t
+bc_algorithm
+bc_specification
--
2.24.3 (Apple Git-128)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
+ <para>
+ For <literal>gzip</literal> the compression level should be an
gzip comma
+++ b/src/backend/replication/basebackup.c
@@ -18,6 +18,7 @@
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
#include "common/file_perm.h"
+#include "common/backup_compression.h"
alphabetical
- errmsg("unrecognized compression algorithm: \"%s\"",
+ errmsg("unrecognized compression algorithm \"%s\"",
Most other places seem to say "compression method". So I'd suggest to change
that here, and in doc/src/sgml/ref/pg_basebackup.sgml.
- if (o_compression_level && !o_compression)
+ if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("compression level requires compression")));
s/level/detail/
/*
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
..
*/
static void
+parse_compress_options(char *option, char **algorithm, char **detail,
+ CompressionLocation *locationres)
It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
support at least level=N. And eventually pg_dump. But those clients shouldn't
accept a client/server prefix. Maybe the way to handle that is for those tools
to check locationres and reject it if it was specified.
+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwhich, the requested algorithm is "turkey"
+ * and the detail string is "sandwhich". We'll sort out whether that's legal
sp: sandwich
+ WalCompressionMethod wal_compress_method;
This is confusingly similar to src/include/access/xlog.h:WalCompression.
I think someone else mentioned this before ?
+ * A compression specification specifies the parameters that should be used
+ * when * performing compression with a specific algorithm. The simplest
star
+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+char *
+get_bc_algorithm_name(bc_algorithm algorithm)
should be const ?
+ /* As a special case, the specification can be a bare integer. */
+ bare_level = strtol(specification, &bare_level_endp, 10);
Should this call expect_integer_value()?
See below.
+ result->parse_error =
+ pstrdup("found empty string where a compression option was expected");
Needs to be localized with _() ?
Also, document that it's pstrdup'd.
+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)
-1 isn't great, since it's also an integer, and, also a valid compression level
for zstd (did you see my message about that?). Maybe INT_MIN is ok.
+{
+ int ivalue;
+ char *ivalue_endp;
+
+ ivalue = strtol(value, &ivalue_endp, 10);
Should this also set/check errno ?
And check if value != ivalue_endp ?
See strtol(3)
+char *
+validate_bc_specification(bc_specification *spec)
...
+ /*
+ * If a compression level was specified, check that the algorithm expects
+ * a compression level and that the level is within the legal range for
+ * the algorithm.
It would be nice if this could be shared with wal_compression and pg_dump.
We shouldn't need multiple places with structures giving the algorithms and
range of compression levels.
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
Should be "unsigned int" or "bits32" ?
The server crashes if I send an unknown option - you should hit that in the
regression tests.
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-lz4:a |wc -c
TRAP: FailedAssertion("pointer != NULL", File: "../../../../src/include/utils/memutils.h", Line: 123, PID: 8627)
postgres: walsender pryzbyj [local] BASE_BACKUP(ExceptionalCondition+0xa0)[0x560b45d7b64b]
postgres: walsender pryzbyj [local] BASE_BACKUP(pfree+0x5d)[0x560b45dad1ea]
postgres: walsender pryzbyj [local] BASE_BACKUP(parse_bc_specification+0x154)[0x560b45dc5d4f]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x43d56c)[0x560b45bc556c]
postgres: walsender pryzbyj [local] BASE_BACKUP(SendBaseBackup+0x2d)[0x560b45bc85ca]
postgres: walsender pryzbyj [local] BASE_BACKUP(exec_replication_command+0x3a2)[0x560b45bdddb2]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostgresMain+0x6b2)[0x560b45c39131]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x40530e)[0x560b45b8d30e]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x408572)[0x560b45b90572]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x4087b9)[0x560b45b907b9]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostmasterMain+0x1135)[0x560b45b91d9b]
postgres: walsender pryzbyj [local] BASE_BACKUP(main+0x229)[0x560b45ad0f78]
This is interpreted like client-gzip-1; should multiple specifications of
compress be prohibited ?
| src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-lz4 --compress=1
Thanks for the review!
I'll address most of these comments later, but quickly for right now...
On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
support at least level=N. And eventually pg_dump. But those clients shouldn't
accept a client/server prefix. Maybe the way to handle that is for those tools
to check locationres and reject it if it was specified.
[...]
This is confusingly similar to src/include/access/xlog.h:WalCompression.
I think someone else mentioned this before ?
A couple of people before me have had delusions of grandeur in this
area. We have the WalCompression enum, which has values of the form
COMPRESSION_*, instead of WAL_COMPRESSION_*, as if the WAL were going
to be the only thing that ever got compressed. And pg_dump.h also has
a CompressionAlgorithm enum, with values like COMPR_ALG_*, which isn't
great naming either. Clearly there's some cleanup needed here: if we
can use the same enum for multiple systems, then it can have a name
implying that it's the only game in town, but otherwise both the enum
name and the corresponding value need to use a suitable prefix. I
think that's a job for another patch, probably post-v15. For now I
plan to do the right thing with the new names I'm adding, and leave
the existing names alone. That can be changed in the future, if and
when it seems sensible.
As I said elsewhere, I think the WAL compression stuff is badly
designed and should probably be rewritten completely, maybe to reuse
the bbstreamer stuff. In that case, WalCompressionMethod would
probably go away entirely, making the naming confusion moot, and
picking up zstd and lz4 compression support for free. If that doesn't
happen, we can probably find some way to at least make them share an
enum, but I think that's too hairy to try to clean up right now with
feature freeze pending.
The server crashes if I send an unknown option - you should hit that in the
regression tests.$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-lz4:a |wc -c
TRAP: FailedAssertion("pointer != NULL", File: "../../../../src/include/utils/memutils.h", Line: 123, PID: 8627)
postgres: walsender pryzbyj [local] BASE_BACKUP(ExceptionalCondition+0xa0)[0x560b45d7b64b]
postgres: walsender pryzbyj [local] BASE_BACKUP(pfree+0x5d)[0x560b45dad1ea]
postgres: walsender pryzbyj [local] BASE_BACKUP(parse_bc_specification+0x154)[0x560b45dc5d4f]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x43d56c)[0x560b45bc556c]
postgres: walsender pryzbyj [local] BASE_BACKUP(SendBaseBackup+0x2d)[0x560b45bc85ca]
postgres: walsender pryzbyj [local] BASE_BACKUP(exec_replication_command+0x3a2)[0x560b45bdddb2]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostgresMain+0x6b2)[0x560b45c39131]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x40530e)[0x560b45b8d30e]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x408572)[0x560b45b90572]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x4087b9)[0x560b45b907b9]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostmasterMain+0x1135)[0x560b45b91d9b]
postgres: walsender pryzbyj [local] BASE_BACKUP(main+0x229)[0x560b45ad0f78]
That's odd - I thought I had tested that case. Will double-check.
This is interpreted like client-gzip-1; should multiple specifications of
compress be prohibited ?| src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-lz4 --compress=1
They're not now and haven't been in the past. I think the last one
should just win (as it apparently does, here). We do that in some
places and throw an error in others and I'm not sure if we have a 100%
consistent rule for it, but flipping one location between one behavior
and the other isn't going to make things more consistent overall.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
gzip comma
I think it's fine the way it's written. If we made that change, then
we'd have a comma for gzip and not for the other two algorithms. Also,
I'm just moving that sentence, so any change that there is to be made
here is a job for some other patch.
alphabetical
Fixed.
- errmsg("unrecognized compression algorithm: \"%s\"", + errmsg("unrecognized compression algorithm \"%s\"",Most other places seem to say "compression method". So I'd suggest to change
that here, and in doc/src/sgml/ref/pg_basebackup.sgml.
I'm not sure that's really better, and I don't think this patch is
introducing an altogether novel usage. I think I would probably try to
standardize on algorithm rather than method if I were standardizing
the whole source tree, but I think we can leave that discussion for
another time.
- if (o_compression_level && !o_compression) + if (o_compression_detail && !o_compression) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("compression level requires compression")));s/level/detail/
Fixed.
It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
support at least level=N. And eventually pg_dump. But those clients shouldn't
accept a client/server prefix. Maybe the way to handle that is for those tools
to check locationres and reject it if it was specified.
One thing I forgot to mention in my previous response is that I think
the parsing code is actually well set up for this the way I have it.
server- and client- gets parsed off in a different place than we
interpret the rest, which fits well with your observation that other
cases wouldn't have a client or server prefix.
sp: sandwich
Fixed.
star
Fixed.
should be const ?
OK.
+ /* As a special case, the specification can be a bare integer. */ + bare_level = strtol(specification, &bare_level_endp, 10);Should this call expect_integer_value()?
See below.
I don't think that would be useful. We have no keyword to pass for the
error message, nor would we use the error message if one got
constructed.
+ result->parse_error = + pstrdup("found empty string where a compression option was expected");Needs to be localized with _() ?
Also, document that it's pstrdup'd.
Did the latter. The former would need to be fixed in a bunch of places
and while I'm happy to accept an expert opinion on exactly what needs
to be done here, I don't want to try to do it and do it wrong. Better
to let someone with good knowledge of the subject matter patch it up
later than do a crummy job now.
-1 isn't great, since it's also an integer, and, also a valid compression level
for zstd (did you see my message about that?). Maybe INT_MIN is ok.
It really doesn't matter. Could just return 42. The client shouldn't
use the value if there's an error.
+{ + int ivalue; + char *ivalue_endp; + + ivalue = strtol(value, &ivalue_endp, 10);Should this also set/check errno ?
And check if value != ivalue_endp ?
See strtol(3)
Even after reading the man page for strtol, it's not clear to me that
this is needed. That page represents checking *endptr != '\0' as
sufficient to tell whether an error occurred. Maybe it wouldn't catch
an out of range value, but in practice all of the algorithms we
support now and any we support in the future are going to catch
something clamped to LONG_MIN or LONG_MAX as out of range and display
the correct error message. What's your specific thinking here?
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
Should be "unsigned int" or "bits32" ?
I do not see why either of those would be better.
The server crashes if I send an unknown option - you should hit that in the
regression tests.
Turns out I was testing this on the client side but not the server
side. Fixed and added more tests.
v2 attached.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v2-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchapplication/octet-stream; name=v2-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchDownload
From d157b30adadb26ffd4f0262d0bc53119eec6e8e7 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Sun, 20 Mar 2022 14:48:19 -0400
Subject: [PATCH v2] Replace BASE_BACKUP COMPRESSION_LEVEL option with
COMPRESSION_DETAIL.
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby.
---
doc/src/sgml/protocol.sgml | 18 +-
src/backend/replication/basebackup.c | 62 +--
src/backend/replication/basebackup_gzip.c | 20 +-
src/backend/replication/basebackup_lz4.c | 19 +-
src/backend/replication/basebackup_zstd.c | 19 +-
src/bin/pg_basebackup/bbstreamer.h | 7 +-
src/bin/pg_basebackup/bbstreamer_gzip.c | 7 +-
src/bin/pg_basebackup/bbstreamer_lz4.c | 4 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 4 +-
src/bin/pg_basebackup/pg_basebackup.c | 405 ++++++++-----------
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 41 +-
src/common/Makefile | 1 +
src/common/backup_compression.c | 258 ++++++++++++
src/include/common/backup_compression.h | 44 ++
src/include/replication/basebackup_sink.h | 7 +-
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/pgindent/typedefs.list | 2 +
17 files changed, 605 insertions(+), 315 deletions(-)
create mode 100644 src/common/backup_compression.c
create mode 100644 src/include/common/backup_compression.h
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term>
+ <term><literal>COMPRESSION_DETAIL</literal> <replaceable>detail</replaceable></term>
<listitem>
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- For <literal>gzip</literal> the value should be an integer between 1
- and 9, for <literal>lz4</literal> between 1 and 12, and for
- <literal>zstd</literal> it should be between 1 and 22.
+ If the value is an integer, it specifies the compression level.
+ Otherwise, it should be a comma-separated list of items, each of
+ the form <literal>keyword</literal> or
+ <literal>keyword=value</literal>. Currently, the only supported
+ keyword is <literal>level</literal>, which sets the compression
+ level.
+ </para>
+
+ <para>
+ For <literal>gzip</literal> the compression level should be an
+ integer between 1 and 9, for <literal>lz4</literal> an integer
+ between 1 and 12, and for <literal>zstd</literal> an integer
+ between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c2aedc14a2..49deead091 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,6 +17,7 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
#include "lib/stringinfo.h"
@@ -54,14 +55,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4,
- BACKUP_COMPRESSION_ZSTD
-} basebackup_compression_type;
-
typedef struct
{
const char *label;
@@ -75,8 +68,8 @@ typedef struct
bool use_copytblspc;
BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
- basebackup_compression_type compression;
- int compression_level;
+ bc_algorithm compression;
+ bc_specification compression_specification;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -713,12 +706,14 @@ parse_basebackup_options(List *options, basebackup_options *opt)
char *target_str = NULL;
char *target_detail_str = NULL;
bool o_compression = false;
- bool o_compression_level = false;
+ bool o_compression_detail = false;
+ char *compression_detail_str = NULL;
MemSet(opt, 0, sizeof(*opt));
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
+ opt->compression_specification.algorithm = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -885,29 +880,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "none") == 0)
- opt->compression = BACKUP_COMPRESSION_NONE;
- else if (strcmp(optval, "gzip") == 0)
- opt->compression = BACKUP_COMPRESSION_GZIP;
- else if (strcmp(optval, "lz4") == 0)
- opt->compression = BACKUP_COMPRESSION_LZ4;
- else if (strcmp(optval, "zstd") == 0)
- opt->compression = BACKUP_COMPRESSION_ZSTD;
- else
+ if (!parse_bc_algorithm(optval, &opt->compression))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized compression algorithm: \"%s\"",
+ errmsg("unrecognized compression algorithm \"%s\"",
optval)));
o_compression = true;
}
- else if (strcmp(defel->defname, "compression_level") == 0)
+ else if (strcmp(defel->defname, "compression_detail") == 0)
{
- if (o_compression_level)
+ if (o_compression_detail)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->compression_level = defGetInt32(defel);
- o_compression_level = true;
+ compression_detail_str = defGetString(defel);
+ o_compression_detail = true;
}
else
ereport(ERROR,
@@ -949,10 +936,25 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_handle =
BaseBackupGetTargetHandle(target_str, target_detail_str);
- if (o_compression_level && !o_compression)
+ if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("compression level requires compression")));
+ errmsg("compression detail requires compression")));
+
+ if (o_compression)
+ {
+ char *error_detail;
+
+ parse_bc_specification(opt->compression, compression_detail_str,
+ &opt->compression_specification);
+ error_detail =
+ validate_bc_specification(&opt->compression_specification);
+ if (error_detail != NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid compression specification: %s",
+ error_detail));
+ }
}
@@ -998,11 +1000,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
- sink = bbsink_gzip_new(sink, opt.compression_level);
+ sink = bbsink_gzip_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
- sink = bbsink_lz4_new(sink, opt.compression_level);
+ sink = bbsink_lz4_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
- sink = bbsink_zstd_new(sink, opt.compression_level);
+ sink = bbsink_zstd_new(sink, &opt.compression_specification);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index b66d3da7a3..703a91ba77 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_gzip_ops = {
#endif
/*
- * Create a new basebackup sink that performs gzip compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs gzip compression.
*/
bbsink *
-bbsink_gzip_new(bbsink *next, int compresslevel)
+bbsink_gzip_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef HAVE_LIBZ
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,15 +72,14 @@ bbsink_gzip_new(bbsink *next, int compresslevel)
bbsink_gzip *sink;
Assert(next != NULL);
- Assert(compresslevel >= 0 && compresslevel <= 9);
- if (compresslevel == 0)
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
compresslevel = Z_DEFAULT_COMPRESSION;
- else if (compresslevel < 0 || compresslevel > 9)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("gzip compression level %d is out of range",
- compresslevel)));
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 9);
+ }
sink = palloc0(sizeof(bbsink_gzip));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index d838f723d0..06c161ddc4 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_lz4_ops = {
#endif
/*
- * Create a new basebackup sink that performs lz4 compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs lz4 compression.
*/
bbsink *
-bbsink_lz4_new(bbsink *next, int compresslevel)
+bbsink_lz4_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_LZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -72,11 +73,13 @@ bbsink_lz4_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 12)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("lz4 compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 12);
+ }
sink = palloc0(sizeof(bbsink_lz4));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27..96b7985693 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -55,12 +55,13 @@ const bbsink_ops bbsink_zstd_ops = {
#endif
/*
- * Create a new basebackup sink that performs zstd compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs zstd compression.
*/
bbsink *
-bbsink_zstd_new(bbsink *next, int compresslevel)
+bbsink_zstd_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_ZSTD
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,11 +72,13 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 22)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("zstd compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 22);
+ }
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index 02d4c05df6..dfa3f77af4 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -22,6 +22,7 @@
#ifndef BBSTREAMER_H
#define BBSTREAMER_H
+#include "common/backup_compression.h"
#include "lib/stringinfo.h"
#include "pqexpbuffer.h"
@@ -200,17 +201,17 @@ bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
*/
extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 894f857103..1979e95639 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -76,7 +76,8 @@ const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
* closed so that the data may be written there.
*/
bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ bc_specification *compress)
{
#ifdef HAVE_LIBZ
bbstreamer_gzip_writer *streamer;
@@ -115,11 +116,11 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
}
}
- if (gzsetparams(streamer->gzfile, compresslevel,
+ if (gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
+ compress->level, get_gz_error(streamer->gzfile));
exit(1);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index 810052e4e3..a6ec317e2b 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -67,7 +67,7 @@ const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
@@ -89,7 +89,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compresslevel;
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb..caa5edcaf1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -63,7 +63,7 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
@@ -85,7 +85,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
/* Initialize stream compression preferences */
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
+ compress->level);
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2943d9ec1a..ec0abdd4d3 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
#include "access/xlog_internal.h"
#include "bbstreamer.h"
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -57,6 +58,7 @@ typedef struct TablespaceList
typedef struct ArchiveStreamState
{
int tablespacenum;
+ bc_specification *compress;
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
@@ -132,9 +134,6 @@ static bool checksum_failure = false;
static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
-static int compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
-static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -198,7 +197,8 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile);
+ bool expect_unterminated_tarfile,
+ bc_specification *compress);
static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
void *callback_data);
static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
@@ -207,7 +207,7 @@ static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum);
+ bool tablespacenum, bc_specification *compress);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
@@ -215,7 +215,9 @@ static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
static void ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf);
static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
void *callback_data);
-static void BaseBackup(void);
+static void BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc,
+ bc_specification *client_compress);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
bool segment_finished);
@@ -542,7 +544,9 @@ typedef struct
} logstreamer_param;
static int
-LogStreamerMain(logstreamer_param *param)
+LogStreamerMain(logstreamer_param *param,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
StreamCtl stream;
@@ -565,25 +569,14 @@ LogStreamerMain(logstreamer_param *param)
stream.mark_done = true;
stream.partial_suffix = NULL;
stream.replication_slot = replication_slot;
-
if (format == 'p')
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
COMPRESSION_NONE, 0,
stream.do_sync);
- else if (compressloc != COMPRESS_LOCATION_CLIENT)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
- stream.do_sync);
- else if (compressmethod == COMPRESSION_GZIP)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- compressmethod,
- compresslevel,
- stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
+ wal_compress_method,
+ wal_compress_level,
stream.do_sync);
if (!ReceiveXlogStream(param->bgconn, &stream))
@@ -629,7 +622,9 @@ LogStreamerMain(logstreamer_param *param)
* stream the logfile in parallel with the backups.
*/
static void
-StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
+StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
logstreamer_param *param;
uint32 hi,
@@ -729,7 +724,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
int ret;
/* in child process */
- ret = LogStreamerMain(param);
+ ret = LogStreamerMain(param, wal_compress_method, wal_compress_level);
/* temp debugging aid to analyze 019_replslot_limit failures */
if (verbose)
@@ -1004,136 +999,81 @@ parse_max_rate(char *src)
}
/*
- * Utility wrapper to parse the values specified for -Z/--compress.
- * *methodres and *levelres will be optionally filled with values coming
- * from the parsed results.
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
+ *
+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwich, the requested algorithm is "turkey"
+ * and the detail string is "sandwich". We'll sort out whether that's legal
+ * at a later stage.
*/
static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
- CompressionLocation *locationres, int *levelres)
+parse_compress_options(char *option, char **algorithm, char **detail,
+ CompressionLocation *locationres)
{
char *sep;
- int firstlen;
- char *firstpart;
+ char *endp;
/*
- * clear 'levelres' so that if there are multiple compression options,
- * the last one fully overrides the earlier ones
- */
- *levelres = 0;
-
- /* check if the option is split in two */
- sep = strchr(src, ':');
-
- /*
- * The first part of the option value could be a method name, or just a
- * level value.
- */
- firstlen = (sep != NULL) ? (sep - src) : strlen(src);
- firstpart = pg_malloc(firstlen + 1);
- memcpy(firstpart, src, firstlen);
- firstpart[firstlen] = '\0';
-
- /*
- * Check if the first part of the string matches with a supported
- * compression method.
+ * Check whether the compression specification consists of a bare integer.
+ *
+ * If so, for backward compatibility, assume gzip.
*/
- if (pg_strcasecmp(firstpart, "gzip") == 0)
+ (void) strtol(option, &endp, 10);
+ if (*endp == '\0')
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ *algorithm = pstrdup("gzip");
+ *detail = pstrdup(option);
+ return;
}
- else if (pg_strcasecmp(firstpart, "client-gzip") == 0)
- {
- *methodres = COMPRESSION_GZIP;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-gzip") == 0)
+
+ /* Strip off any "client-" or "server-" prefix. */
+ if (strncmp(option, "server-", 7) == 0)
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
+ option += 7;
}
- else if (pg_strcasecmp(firstpart, "lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ else if (strncmp(option, "client-", 7) == 0)
{
- *methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "none") == 0)
- {
- *methodres = COMPRESSION_NONE;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ option += 7;
}
else
- {
- /*
- * It does not match anything known, so check for the
- * backward-compatible case of only an integer where the implied
- * compression method changes depending on the level value.
- */
- if (!option_parse_int(firstpart, "-Z/--compress", 0,
- INT_MAX, levelres))
- exit(1);
-
- *methodres = (*levelres > 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
- free(firstpart);
- return;
- }
-
+ /*
+ * Check whether there is a compression detail following the algorithm
+ * name.
+ */
+ sep = strchr(option, ':');
if (sep == NULL)
{
- /*
- * The caller specified a method without a colon separator, so let any
- * subsequent checks assign a default level.
- */
- free(firstpart);
- return;
+ *algorithm = pstrdup(option);
+ *detail = NULL;
}
-
- /* Check the contents after the colon separator. */
- sep++;
- if (*sep == '\0')
+ else
{
- pg_log_error("no compression level defined for method %s", firstpart);
- exit(1);
- }
+ char *alg;
- /*
- * For any of the methods currently supported, the data after the
- * separator can just be an integer.
- */
- if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
- levelres))
- exit(1);
+ alg = palloc((sep - option) + 1);
+ memcpy(alg, option, sep - option);
+ alg[sep - option] = '\0';
- free(firstpart);
+ *algorithm = alg;
+ *detail = pstrdup(sep + 1);
+ }
}
/*
@@ -1200,7 +1140,8 @@ static bbstreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile)
+ bool expect_unterminated_tarfile,
+ bc_specification *compress)
{
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
@@ -1316,32 +1257,28 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
- if (compressmethod == COMPRESSION_NONE ||
- compressloc != COMPRESS_LOCATION_CLIENT)
+ if (compress->algorithm == BACKUP_COMPRESSION_NONE)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- else if (compressmethod == COMPRESSION_GZIP)
+ else if (compress->algorithm == BACKUP_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
- archive_file,
- compresslevel);
+ archive_file, compress);
}
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (compress->algorithm == BACKUP_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_lz4_compressor_new(streamer, compress);
}
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (compress->algorithm == BACKUP_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1395,13 +1332,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ if (format == 'p')
{
- if (compressmethod == COMPRESSION_GZIP)
+ if (is_tar_gz)
streamer = bbstreamer_gzip_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (is_tar_lz4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (is_tar_zstd)
streamer = bbstreamer_zstd_decompressor_new(streamer);
}
@@ -1415,13 +1352,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* manifest if present - as a single COPY stream.
*/
static void
-ReceiveArchiveStream(PGconn *conn)
+ReceiveArchiveStream(PGconn *conn, bc_specification *compress)
{
ArchiveStreamState state;
/* Set up initial state. */
memset(&state, 0, sizeof(state));
state.tablespacenum = -1;
+ state.compress = compress;
/* All the real work happens in ReceiveArchiveStreamChunk. */
ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
@@ -1542,7 +1480,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
CreateBackupStreamer(archive_name,
spclocation,
&state->manifest_inject_streamer,
- true, false);
+ true, false,
+ state->compress);
}
break;
}
@@ -1743,7 +1682,7 @@ ReportCopyDataParseError(size_t r, char *copybuf)
*/
static void
ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum)
+ bool tablespacenum, bc_specification *compress)
{
WriteTarState state;
bbstreamer *manifest_inject_streamer;
@@ -1759,7 +1698,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
state.streamer = CreateBackupStreamer(archive_name, spclocation,
&manifest_inject_streamer,
is_recovery_guc_supported,
- expect_unterminated_tarfile);
+ expect_unterminated_tarfile,
+ compress);
state.tablespacenum = tablespacenum;
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
progress_update_filename(NULL);
@@ -1902,7 +1842,8 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
}
static void
-BaseBackup(void)
+BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc, bc_specification *client_compress)
{
PGresult *res;
char *sysidentifier;
@@ -2055,33 +1996,17 @@ BaseBackup(void)
if (compressloc == COMPRESS_LOCATION_SERVER)
{
- char *compressmethodstr = NULL;
-
if (!use_new_option_syntax)
{
pg_log_error("server does not support server-side compression");
exit(1);
}
- switch (compressmethod)
- {
- case COMPRESSION_GZIP:
- compressmethodstr = "gzip";
- break;
- case COMPRESSION_LZ4:
- compressmethodstr = "lz4";
- break;
- case COMPRESSION_ZSTD:
- compressmethodstr = "zstd";
- break;
- default:
- Assert(false);
- break;
- }
AppendStringCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION", compressmethodstr);
- if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
- AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION_LEVEL", compresslevel);
+ "COMPRESSION", compression_algorithm);
+ if (compression_detail != NULL)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_DETAIL",
+ compression_detail);
}
if (verbose)
@@ -2207,15 +2132,33 @@ BaseBackup(void)
*/
if (includewal == STREAM_WAL)
{
+ WalCompressionMethod wal_compress_method;
+ int wal_compress_level;
+
if (verbose)
pg_log_info("starting background WAL receiver");
- StartLogStreamer(xlogstart, starttli, sysidentifier);
+
+ if (client_compress->algorithm == BACKUP_COMPRESSION_GZIP)
+ {
+ wal_compress_method = COMPRESSION_GZIP;
+ wal_compress_level =
+ (client_compress->options & BACKUP_COMPRESSION_OPTION_LEVEL)
+ != 0 ? client_compress->level : 0;
+ }
+ else
+ {
+ wal_compress_method = COMPRESSION_NONE;
+ wal_compress_level = 0;
+ }
+
+ StartLogStreamer(xlogstart, starttli, sysidentifier,
+ wal_compress_method, wal_compress_level);
}
if (serverMajor >= 1500)
{
/* Receive a single tar stream with everything. */
- ReceiveArchiveStream(conn);
+ ReceiveArchiveStream(conn, client_compress);
}
else
{
@@ -2244,7 +2187,8 @@ BaseBackup(void)
spclocation = PQgetvalue(res, i, 1);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ ReceiveTarFile(conn, archive_name, spclocation, i,
+ client_compress);
}
/*
@@ -2511,6 +2455,10 @@ main(int argc, char **argv)
int c;
int option_index;
+ char *compression_algorithm = "none";
+ char *compression_detail = NULL;
+ CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
+ bc_specification client_compress;
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2616,17 +2564,13 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
- compressmethod = COMPRESSION_GZIP;
+ compression_algorithm = "gzip";
+ compression_detail = NULL;
compressloc = COMPRESS_LOCATION_UNSPECIFIED;
break;
case 'Z':
- parse_compress_options(optarg, &compressmethod,
- &compressloc, &compresslevel);
+ parse_compress_options(optarg, &compression_algorithm,
+ &compression_detail, &compressloc);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2753,12 +2697,11 @@ main(int argc, char **argv)
}
/*
- * If we're compressing the backup and the user has not said where to
- * perform the compression, do it on the client, unless they specified
- * --target, in which case the server is the only choice.
+ * If the user has not specified where to perform backup compression,
+ * default to the client, unless the user specified --target, in which case
+ * the server is the only choice.
*/
- if (compressmethod != COMPRESSION_NONE &&
- compressloc == COMPRESS_LOCATION_UNSPECIFIED)
+ if (compressloc == COMPRESS_LOCATION_UNSPECIFIED)
{
if (backup_target == NULL)
compressloc = COMPRESS_LOCATION_CLIENT;
@@ -2766,6 +2709,40 @@ main(int argc, char **argv)
compressloc = COMPRESS_LOCATION_SERVER;
}
+ /*
+ * If any compression that we're doing is happening on the client side,
+ * we must try to parse the compression algorithm and detail, but if it's
+ * all on the server side, then we're just going to pass through whatever
+ * was requested and let the server decide what to do.
+ */
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ bc_algorithm alg;
+ char *error_detail;
+
+ if (!parse_bc_algorithm(compression_algorithm, &alg))
+ {
+ pg_log_error("unrecognized compression algorithm \"%s\"",
+ compression_algorithm);
+ exit(1);
+ }
+
+ parse_bc_specification(alg, compression_detail, &client_compress);
+ error_detail = validate_bc_specification(&client_compress);
+ if (error_detail != NULL)
+ {
+ pg_log_error("invalid compression specification: %s",
+ error_detail);
+ exit(1);
+ }
+ }
+ else
+ {
+ Assert(compressloc = COMPRESS_LOCATION_SERVER);
+ client_compress.algorithm = BACKUP_COMPRESSION_NONE;
+ client_compress.options = 0;
+ }
+
/*
* Can't perform client-side compression if the backup is not being
* sent to the client.
@@ -2779,9 +2756,10 @@ main(int argc, char **argv)
}
/*
- * Compression doesn't make sense unless tar format is in use.
+ * Client-side compression doesn't make sense unless tar format is in use.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT)
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT &&
+ client_compress.algorithm != BACKUP_COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2882,56 +2860,6 @@ main(int argc, char **argv)
}
}
- /* Sanity checks for compression-related options. */
- switch (compressmethod)
- {
- case COMPRESSION_NONE:
- if (compresslevel != 0)
- {
- pg_log_error("cannot use compression level with method %s",
- "none");
- fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
- progname);
- exit(1);
- }
- break;
- case COMPRESSION_GZIP:
- if (compresslevel > 9)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 9",
- compresslevel, "gzip");
- exit(1);
- }
- if (compressloc == COMPRESS_LOCATION_CLIENT)
- {
-#ifdef HAVE_LIBZ
- if (compresslevel == 0)
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- pg_log_error("this build does not support compression with %s",
- "gzip");
- exit(1);
-#endif
- }
- break;
- case COMPRESSION_LZ4:
- if (compresslevel > 12)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 12",
- compresslevel, "lz4");
- exit(1);
- }
- break;
- case COMPRESSION_ZSTD:
- if (compresslevel > 22)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 22",
- compresslevel, "zstd");
- exit(1);
- }
- break;
- }
-
/*
* Sanity checks for progress reporting options.
*/
@@ -3040,7 +2968,8 @@ main(int argc, char **argv)
free(linkloc);
}
- BaseBackup();
+ BaseBackup(compression_algorithm, compression_detail, compressloc,
+ &client_compress);
success = true;
return 0;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index efefe947d9..ea5ef3152c 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -42,15 +42,15 @@ $node->command_fails(['pg_basebackup'],
# Sanity checks for options
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:1' ],
- qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure if method "none" specified with compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none+' ],
- qr/\Qpg_basebackup: error: invalid value "none+" for option/,
+ qr/\Qunrecognized compression algorithm "none+"/,
'failure on incorrect separator to define compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:' ],
- qr/\Qpg_basebackup: error: no compression level defined for method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure on missing compression level value');
# Some Windows ANSI code pages may reject this filename, in which case we
@@ -89,6 +89,41 @@ print $conf "wal_level = replica\n";
close $conf;
$node->restart;
+# Now that we have a server that supports replication commands, test whether
+# certain scenarios fail on the client side or on the server side.
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'gzip:level=236' ],
+ qr/\Qpg_basebackup: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9/,
+ 'client failure on out-of-range compression level');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'server-gzip:level=236' ],
+ qr/\Qpg_basebackup: error: could not initiate base backup: ERROR: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9/,
+ 'server failure on out-of-range compression level');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'client-extrasquishy' ],
+ qr/\Qpg_basebackup: error: unrecognized compression algorithm "extrasquishy"/,
+ 'client failure on invalid compression algorithm');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'server-extrasquishy' ],
+ qr/\Qpg_basebackup: error: could not initiate base backup: ERROR: unrecognized compression algorithm "extrasquishy"/,
+ 'server failure on invalid compression algorithm');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'gzip:thunk' ],
+ qr/\Qpg_basebackup: error: invalid compression specification: unknown compression option "thunk"/,
+ 'failure on missing compression level value');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'gzip:level=high' ],
+ qr/\Qpg_basebackup: error: invalid compression specification: value for compression option "level" must be an integer/,
+ 'failure on non-numeric compression level');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'server-gzip:thunk' ],
+ qr/\Qpg_basebackup: error: could not initiate base backup: ERROR: invalid compression specification: unknown compression option "thunk"/,
+ 'failure on missing compression level value');
+$node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'server-gzip:level=high' ],
+ qr/\Qpg_basebackup: error: could not initiate base backup: ERROR: invalid compression specification: value for compression option "level" must be an integer/,
+ 'failure on non-numeric compression level');
+
# Write some files to test that they are not copied.
foreach my $filename (
qw(backup_label tablespace_map postgresql.auto.conf.tmp
diff --git a/src/common/Makefile b/src/common/Makefile
index 31c0dd366d..f627349835 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -47,6 +47,7 @@ LIBS += $(PTHREAD_LIBS)
OBJS_COMMON = \
archive.o \
+ backup_compression.o \
base64.o \
checksum_helper.o \
config_info.o \
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
new file mode 100644
index 0000000000..bf426d6b7e
--- /dev/null
+++ b/src/common/backup_compression.c
@@ -0,0 +1,258 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.c
+ *
+ * Shared code for backup compression methods and specifications.
+ *
+ * A compression specification specifies the parameters that should be used
+ * when performing compression with a specific algorithm. The simplest
+ * possible compression specification is an integer, which sets the
+ * compression level.
+ *
+ * Otherwise, a compression specification is a comma-separated list of items,
+ * each having the form keyword or keyword=value.
+ *
+ * Currently, the only supported keyword is "level".
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.c
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/backup_compression.h"
+
+static int expect_integer_value(char *keyword, char *value,
+ bc_specification *result);
+
+/*
+ * Look up a compression algorithm by name. Returns true and sets *algorithm
+ * if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_bc_algorithm(char *name, bc_algorithm *algorithm)
+{
+ if (strcmp(name, "none") == 0)
+ *algorithm = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(name, "gzip") == 0)
+ *algorithm = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(name, "lz4") == 0)
+ *algorithm = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(name, "zstd") == 0)
+ *algorithm = BACKUP_COMPRESSION_ZSTD;
+ else
+ return false;
+ return true;
+}
+
+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+const char *
+get_bc_algorithm_name(bc_algorithm algorithm)
+{
+ switch (algorithm)
+ {
+ case BACKUP_COMPRESSION_NONE:
+ return "none";
+ case BACKUP_COMPRESSION_GZIP:
+ return "gzip";
+ case BACKUP_COMPRESSION_LZ4:
+ return "lz4";
+ case BACKUP_COMPRESSION_ZSTD:
+ return "zstd";
+ /* no default, to provoke compiler warnings if values are added */
+ }
+ Assert(false);
+}
+
+/*
+ * Parse a compression specification for a specified algorithm.
+ *
+ * See the file header comments for a brief description of what a compression
+ * specification is expected to look like.
+ *
+ * On return, all fields of the result object will be initialized.
+ * In particular, result->parse_error will be NULL if no errors occurred
+ * during parsing, and will otherwise contain a an appropriate error message.
+ * The caller may free this error message string using pfree, if desired.
+ * Note, however, even if there's no parse error, the string might not make
+ * sense: e.g. for gzip, level=12 is not sensible, but it does parse OK.
+ *
+ * Use validate_bc_specification() to find out whether a compression
+ * specification is semantically sensible.
+ */
+void
+parse_bc_specification(bc_algorithm algorithm, char *specification,
+ bc_specification *result)
+{
+ int bare_level;
+ char *bare_level_endp;
+
+ /* Initial setup of result object. */
+ result->algorithm = algorithm;
+ result->options = 0;
+ result->level = -1;
+ result->parse_error = NULL;
+
+ /* If there is no specification, we're done already. */
+ if (specification == NULL)
+ return;
+
+ /* As a special case, the specification can be a bare integer. */
+ bare_level = strtol(specification, &bare_level_endp, 10);
+ if (*bare_level_endp == '\0')
+ {
+ result->level = bare_level;
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ return;
+ }
+
+ /* Look for comma-separated keyword or keyword=value entries. */
+ while (1)
+ {
+ char *kwstart;
+ char *kwend;
+ char *vstart;
+ char *vend;
+ int kwlen;
+ int vlen;
+ char *keyword;
+ char *value;
+
+ /* Figure start, end, and length of next keyword and any value. */
+ kwstart = kwend = specification;
+ while (*kwend != '\0' && *kwend != ',' && *kwend != '=')
+ ++kwend;
+ kwlen = kwend - kwstart;
+ if (*kwend != '=')
+ {
+ vstart = vend = NULL;
+ vlen = 0;
+ }
+ else
+ {
+ vstart = vend = kwend + 1;
+ while (*vend != '\0' && *vend != ',')
+ ++vend;
+ vlen = vend - vstart;
+ }
+
+ /* Reject empty keyword. */
+ if (kwlen == 0)
+ {
+ result->parse_error =
+ pstrdup("found empty string where a compression option was expected");
+ break;
+ }
+
+ /* Extract keyword and value as separate C strings. */
+ keyword = palloc(kwlen + 1);
+ memcpy(keyword, kwstart, kwlen);
+ keyword[kwlen] = '\0';
+ if (vlen == 0)
+ value = NULL;
+ else
+ {
+ value = palloc(vlen + 1);
+ memcpy(value, vstart, vlen);
+ value[vlen] = '\0';
+ }
+
+ /* Handle whatever keyword we found. */
+ if (strcmp(keyword, "level") == 0)
+ {
+ result->level = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ }
+ else
+ result->parse_error =
+ psprintf("unknown compression option \"%s\"", keyword);
+
+ /* Release memory, just to be tidy. */
+ pfree(keyword);
+ if (value != NULL)
+ pfree(value);
+
+ /* If we got an error or have reached the end of the string, stop. */
+ if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ break;
+
+ /* Advance to next entry and loop around. */
+ specification = vend == NULL ? kwend + 1 : vend + 1;
+ }
+}
+
+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)
+{
+ int ivalue;
+ char *ivalue_endp;
+
+ ivalue = strtol(value, &ivalue_endp, 10);
+ if (*ivalue_endp != '\0')
+ {
+ result->parse_error =
+ psprintf("value for compression option \"%s\" must be an integer",
+ keyword);
+ return -1;
+ }
+ return ivalue;
+}
+
+/*
+ * Returns NULL if the compression specification string was syntactically
+ * valid and semantically sensible. Otherwise, returns an error message.
+ *
+ * Does not test whether this build of PostgreSQL supports the requested
+ * compression method.
+ */
+char *
+validate_bc_specification(bc_specification *spec)
+{
+ /* If it didn't even parse OK, it's definitely no good. */
+ if (spec->parse_error != NULL)
+ return spec->parse_error;
+
+ /*
+ * If a compression level was specified, check that the algorithm expects
+ * a compression level and that the level is within the legal range for
+ * the algorithm.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ int min_level = 1;
+ int max_level;
+
+ if (spec->algorithm == BACKUP_COMPRESSION_GZIP)
+ max_level = 9;
+ else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
+ max_level = 12;
+ else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ max_level = 22;
+ else
+ return psprintf("compression algorithm \"%s\" does not accept a compression level",
+ get_bc_algorithm_name(spec->algorithm));
+
+ if (spec->level < min_level || spec->level > max_level)
+ return psprintf("compression algorithm \"%s\" expects a compression level between %d and %d",
+ get_bc_algorithm_name(spec->algorithm),
+ min_level, max_level);
+ }
+
+ return NULL;
+}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
new file mode 100644
index 0000000000..0565cbc657
--- /dev/null
+++ b/src/include/common/backup_compression.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.h
+ *
+ * Shared definitions for backup compression methods and specifications.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BACKUP_COMPRESSION_H
+#define BACKUP_COMPRESSION_H
+
+typedef enum bc_algorithm
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
+} bc_algorithm;
+
+#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+
+typedef struct bc_specification
+{
+ bc_algorithm algorithm;
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
+ int level;
+ char *parse_error; /* NULL if parsing was OK, else message */
+} bc_specification;
+
+extern bool parse_bc_algorithm(char *name, bc_algorithm *algorithm);
+extern const char *get_bc_algorithm_name(bc_algorithm algorithm);
+
+extern void parse_bc_specification(bc_algorithm algorithm,
+ char *specification,
+ bc_specification *result);
+
+extern char *validate_bc_specification(bc_specification *);
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a7f16758a4..654df28576 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -27,6 +27,7 @@
#define BASEBACKUP_SINK_H
#include "access/xlog_internal.h"
+#include "common/backup_compression.h"
#include "nodes/pg_list.h"
/* Forward declarations. */
@@ -283,9 +284,9 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
-extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_gzip_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_lz4_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_zstd_new(bbsink *next, bc_specification *);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 441d6ae6bf..de8676d339 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -124,7 +124,7 @@ sub mkvcbuild
}
our @pgcommonallfiles = qw(
- archive.c base64.c checksum_helper.c
+ archive.c backup_compression.c base64.c checksum_helper.c
config_info.c controldata_utils.c d2s.c encnames.c exec.c
f2s.c file_perm.c file_utils.c hashfn.c ip.c jsonapi.c
keywords.c kwlookup.c link-canary.c md5_common.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaf3e7a8d4..01748cac07 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3788,3 +3788,5 @@ yyscan_t
z_stream
z_streamp
zic_t
+bc_algorithm
+bc_specification
--
2.24.3 (Apple Git-128)
Robert Haas <robertmhaas@gmail.com> writes:
Should this also set/check errno ?
And check if value != ivalue_endp ?
See strtol(3)
Even after reading the man page for strtol, it's not clear to me that
this is needed. That page represents checking *endptr != '\0' as
sufficient to tell whether an error occurred.
I'm not sure whose man page you looked at, but the POSIX standard [1]https://pubs.opengroup.org/onlinepubs/9699919799/
has a pretty clear opinion about this:
Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are
returned on error and are also valid returns on success, an
application wishing to check for error situations should set errno to
0, then call strtol() or strtoll(), then check errno.
Checking *endptr != '\0' is for detecting whether there is trailing
garbage after the number; which may be an error case or not as you
choose, but it's a different matter.
regards, tom lane
On Sun, Mar 20, 2022 at 03:05:28PM -0400, Robert Haas wrote:
On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
- errmsg("unrecognized compression algorithm: \"%s\"", + errmsg("unrecognized compression algorithm \"%s\"",Most other places seem to say "compression method". So I'd suggest to change
that here, and in doc/src/sgml/ref/pg_basebackup.sgml.I'm not sure that's really better, and I don't think this patch is
introducing an altogether novel usage. I think I would probably try to
standardize on algorithm rather than method if I were standardizing
the whole source tree, but I think we can leave that discussion for
another time.
The user-facing docs are already standardized using "compression method", with
2 exceptions, of which one is contrib/ and the other is what I'm suggesting to
make consistent here.
$ git grep 'compression algorithm' doc
doc/src/sgml/pgcrypto.sgml: Which compression algorithm to use. Only available if
doc/src/sgml/ref/pg_basebackup.sgml: compression algorithm is selected, or if server-side compression
+ result->parse_error = + pstrdup("found empty string where a compression option was expected");Needs to be localized with _() ?
Also, document that it's pstrdup'd.Did the latter. The former would need to be fixed in a bunch of places
and while I'm happy to accept an expert opinion on exactly what needs
to be done here, I don't want to try to do it and do it wrong. Better
to let someone with good knowledge of the subject matter patch it up
later than do a crummy job now.
I believe it just needs _("foo")
See git grep '= _('
I mentioned another issue off-list:
pg_basebackup.c:2741:10: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
2741 | Assert(compressloc = COMPRESS_LOCATION_SERVER);
| ^~~~~~~~~~~
pg_basebackup.c:2741:3: note: in expansion of macro ‘Assert’
2741 | Assert(compressloc = COMPRESS_LOCATION_SERVER);
This crashes the server using your v2 patch:
src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-zstd:level, |wc -c
I wonder whether the syntax should really use both ":" and ",".
Maybe ":" isn't needed at all.
This patch also needs to update the other user-facing docs.
typo: contain a an
--
Justin
On Sun, Mar 20, 2022 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Even after reading the man page for strtol, it's not clear to me that
this is needed. That page represents checking *endptr != '\0' as
sufficient to tell whether an error occurred.I'm not sure whose man page you looked at, but the POSIX standard [1]
has a pretty clear opinion about this:Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are
returned on error and are also valid returns on success, an
application wishing to check for error situations should set errno to
0, then call strtol() or strtoll(), then check errno.Checking *endptr != '\0' is for detecting whether there is trailing
garbage after the number; which may be an error case or not as you
choose, but it's a different matter.
I think I'm guilty of verbal inexactitude here but not bad coding.
Checking for *endptr != '\0', as I did, is not sufficient to detect
"whether an error occurred," as I alleged. But, in the part of my
response you didn't quote, I believe I made it clear that I only need
to detect garbage, not out-of-range values. And I think *endptr !=
'\0' will do that.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
I think I'm guilty of verbal inexactitude here but not bad coding.
Checking for *endptr != '\0', as I did, is not sufficient to detect
"whether an error occurred," as I alleged. But, in the part of my
response you didn't quote, I believe I made it clear that I only need
to detect garbage, not out-of-range values. And I think *endptr !=
'\0' will do that.
Hmm ... do you consider an empty string to be valid input?
regards, tom lane
On Sun, Mar 20, 2022 at 3:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
The user-facing docs are already standardized using "compression method", with
2 exceptions, of which one is contrib/ and the other is what I'm suggesting to
make consistent here.$ git grep 'compression algorithm' doc
doc/src/sgml/pgcrypto.sgml: Which compression algorithm to use. Only available if
doc/src/sgml/ref/pg_basebackup.sgml: compression algorithm is selected, or if server-side compression
Well, if you just count the number of occurrences of each string in
the documentation, sure. But all of the ones that are talking about a
compression method seem to have to do with configurable TOAST
compression, and the fact that the documentation for that feature is
more extensive than for the pre-existing feature that refers to a
compression algorithm does not, at least in my view, turn it into a
project standard from which no deviation is permitted.
Did the latter. The former would need to be fixed in a bunch of places
and while I'm happy to accept an expert opinion on exactly what needs
to be done here, I don't want to try to do it and do it wrong. Better
to let someone with good knowledge of the subject matter patch it up
later than do a crummy job now.I believe it just needs _("foo")
See git grep '= _('
Hmm. Maybe.
I mentioned another issue off-list:
pg_basebackup.c:2741:10: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
2741 | Assert(compressloc = COMPRESS_LOCATION_SERVER);
| ^~~~~~~~~~~
pg_basebackup.c:2741:3: note: in expansion of macro ‘Assert’
2741 | Assert(compressloc = COMPRESS_LOCATION_SERVER);This crashes the server using your v2 patch:
src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-zstd:level, |wc -c
Well that's unfortunate. Will fix.
I wonder whether the syntax should really use both ":" and ",".
Maybe ":" isn't needed at all.
I don't think we should treat the compression method name in the same
way as a compression algorithm option.
This patch also needs to update the other user-facing docs.
Which ones exactly?
typo: contain a an
OK, will fix.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sun, Mar 20, 2022 at 9:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I think I'm guilty of verbal inexactitude here but not bad coding.
Checking for *endptr != '\0', as I did, is not sufficient to detect
"whether an error occurred," as I alleged. But, in the part of my
response you didn't quote, I believe I made it clear that I only need
to detect garbage, not out-of-range values. And I think *endptr !=
'\0' will do that.Hmm ... do you consider an empty string to be valid input?
No, and I thought I had checked properly for that condition before
reaching the point in the code where I call strtol(), but it turns out
I have not, which I guess is what Justin has been trying to tell me
for a few emails now.
I'll send an updated patch tomorrow after looking this all over more carefully.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
This patch also needs to update the other user-facing docs.
Which ones exactly?
I mean pg_basebackup -Z
-Z level
-Z [{client|server}-]method[:level]
--compress=level
--compress=[{client|server}-]method[:level]
On Mon, Mar 21, 2022 at 9:18 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
This patch also needs to update the other user-facing docs.
Which ones exactly?
I mean pg_basebackup -Z
-Z level
-Z [{client|server}-]method[:level]
--compress=level
--compress=[{client|server}-]method[:level]
Ah, right. Thanks.
Here's v3. I have updated that section of the documentation. I also
went and added a bunch more test cases for validation of compression
detail strings, many inspired by your examples, and fixed all the bugs
that I found in the process. I think the crashes you complained about
are now fixed, but please let me know if I have missed any. I also
added _() calls as you suggested. I searched for the "contain a an"
typo that you mentioned but was not able to find it. Can you give me a
more specific pointer?
I looked a little bit more at the compression method vs. compression
algorithm thing. I agree that there is some inconsistency in
terminology here, but I'm still not sure that we are well-served by
trying to make it totally uniform, especially if we pick the word
"method" as the standard rather than "algorithm". In my opinion,
"method" is less specific than "algorithm". If someone asks me to
choose a compression algorithm, I know that I should give an answer
like "lz4" or "zstd". If they ask me to pick a compression method, I'm
not quite sure whether they want that kind of answer or whether they
want something more detailed, like "use lz4 with compression level 3
and a 1MB block size". After all, that is (at least according to my
understanding of how English works) a perfectly valid answer to the
question "what method should I use to compress this data?" -- but not
to the question "what algorithm should I use to compress this data?".
The latter can ONLY be properly answered by saying something like
"lz4". And I think that's really the root of my hesitation to make the
kinds of changes you want here. If it's just a question of specifying
a compression algorithm and a level, I don't think using the name
"method" for the algorithm is going to be too bad. But as we enrich
the system with multiple compression algorithms each of which may have
multiple and different parameters, I think the whole thing becomes
murkier and the need for precision in language goes up.
Now that is of course an arguable position and you're welcome to
disagree with it, but I think that's part of why I'm hesitating.
Another part of it, at least for me, is that complete uniformity is
not always a positive. I suppose all of us have had the experience at
some point of reading a manual that says something like "to activate
the boil water function, press and release the 'boil water' button"
and rolled our eyes at how useless it was. It's important to me that
we don't fall into that trap. We clearly don't want to go ballistic
and have random inconsistencies in language for no reason, but at the
same time, it's not useful to tell people that METHOD should be
replaced with a compression method and LEVEL with a compression level.
I mean, if you end up saying something like that interspersed with
non-obvious information, that is OK, and I don't want to overstate the
point I'm trying to make. But it seems to me that if there's a little
variation in phrasing and we end up saying that METHOD means the
compression algorithm or that ALGORITHM means the compression method
or whatever, that can actually make things more clear. Here again it's
debatable: how much variation in phraseology is helpful, and at what
point does it just start to seem inconsistent? Well, everyone may have
their own opinion.
I'm not trying to pretend that this patch (or the existing code base)
gets this all right. But I do think that, to the extent that we have a
considered position on what to do here, we can make that change later,
perhaps even after getting some user feedback on what does and does
not make sense to other people. And I also think that what we end up
doing here may well end up being more nuanced than a blanket
search-and-replace. I'm not saying we couldn't make a blanket
search-and-replace. I just don't see it as necessarily creating value,
or being all that closely connected to the goal of this patch, which
is to quickly clean up a forward-compatibility risk before we hit
feature freeze.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v3-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchapplication/octet-stream; name=v3-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchDownload
From 6a27b676550e0b58c49a3d852294ac6e66a6b169 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 21 Mar 2022 12:09:11 -0400
Subject: [PATCH v3] Replace BASE_BACKUP COMPRESSION_LEVEL option with
COMPRESSION_DETAIL.
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby.
---
doc/src/sgml/protocol.sgml | 18 +-
doc/src/sgml/ref/pg_basebackup.sgml | 25 +-
src/backend/replication/basebackup.c | 62 +--
src/backend/replication/basebackup_gzip.c | 20 +-
src/backend/replication/basebackup_lz4.c | 19 +-
src/backend/replication/basebackup_zstd.c | 19 +-
src/bin/pg_basebackup/bbstreamer.h | 7 +-
src/bin/pg_basebackup/bbstreamer_gzip.c | 7 +-
src/bin/pg_basebackup/bbstreamer_lz4.c | 4 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 4 +-
src/bin/pg_basebackup/pg_basebackup.c | 409 ++++++++-----------
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 72 +++-
src/common/Makefile | 1 +
src/common/backup_compression.c | 269 ++++++++++++
src/include/common/backup_compression.h | 44 ++
src/include/replication/basebackup_sink.h | 7 +-
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/pgindent/typedefs.list | 2 +
18 files changed, 662 insertions(+), 329 deletions(-)
create mode 100644 src/common/backup_compression.c
create mode 100644 src/include/common/backup_compression.h
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term>
+ <term><literal>COMPRESSION_DETAIL</literal> <replaceable>detail</replaceable></term>
<listitem>
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- For <literal>gzip</literal> the value should be an integer between 1
- and 9, for <literal>lz4</literal> between 1 and 12, and for
- <literal>zstd</literal> it should be between 1 and 22.
+ If the value is an integer, it specifies the compression level.
+ Otherwise, it should be a comma-separated list of items, each of
+ the form <literal>keyword</literal> or
+ <literal>keyword=value</literal>. Currently, the only supported
+ keyword is <literal>level</literal>, which sets the compression
+ level.
+ </para>
+
+ <para>
+ For <literal>gzip</literal> the compression level should be an
+ integer between 1 and 9, for <literal>lz4</literal> an integer
+ between 1 and 12, and for <literal>zstd</literal> an integer
+ between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 4a630b59b7..46d7f15e54 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -399,9 +399,9 @@ PostgreSQL documentation
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
- <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>detail</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[{client|server}-]<replaceable class="parameter">method[:<replaceable>detail</replaceable>]</replaceable></option></term>
<listitem>
<para>
Requests compression of the backup. If <literal>client</literal> or
@@ -419,13 +419,20 @@ PostgreSQL documentation
<para>
The compression method can be set to <literal>gzip</literal>,
<literal>lz4</literal>, <literal>zstd</literal>, or
- <literal>none</literal> for no compression. A compression level can
- optionally be specified, by appending the level number after a colon
- (<literal>:</literal>). If no level is specified, the default
- compression level will be used. If only a level is specified without
- mentioning an algorithm, <literal>gzip</literal> compression will be
- used if the level is greater than 0, and no compression will be used if
- the level is 0.
+ <literal>none</literal> for no compression. A compression detail
+ string can optionally be specified. If the detail string is an
+ integer, it specifies the compression level. Otherwise, it should be
+ a comma-separated list of items, each of the form
+ <literal>keyword</literal> or <literal>keyword=value</literal>.
+ Currently, the only supported keyword is <literal>level</literal>,
+ which sets the compression level.
+ </para>
+ <para>
+ If no compression level is specified, the default compression level
+ will be used. If only a level is specified without mentioning an
+ algorithm, <literal>gzip</literal> compression will be used if the
+ level is greater than 0, and no compression will be used if the level
+ is 0.
</para>
<para>
When the tar format is used with <literal>gzip</literal>,
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c2aedc14a2..49deead091 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,6 +17,7 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
#include "lib/stringinfo.h"
@@ -54,14 +55,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4,
- BACKUP_COMPRESSION_ZSTD
-} basebackup_compression_type;
-
typedef struct
{
const char *label;
@@ -75,8 +68,8 @@ typedef struct
bool use_copytblspc;
BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
- basebackup_compression_type compression;
- int compression_level;
+ bc_algorithm compression;
+ bc_specification compression_specification;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -713,12 +706,14 @@ parse_basebackup_options(List *options, basebackup_options *opt)
char *target_str = NULL;
char *target_detail_str = NULL;
bool o_compression = false;
- bool o_compression_level = false;
+ bool o_compression_detail = false;
+ char *compression_detail_str = NULL;
MemSet(opt, 0, sizeof(*opt));
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
+ opt->compression_specification.algorithm = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -885,29 +880,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "none") == 0)
- opt->compression = BACKUP_COMPRESSION_NONE;
- else if (strcmp(optval, "gzip") == 0)
- opt->compression = BACKUP_COMPRESSION_GZIP;
- else if (strcmp(optval, "lz4") == 0)
- opt->compression = BACKUP_COMPRESSION_LZ4;
- else if (strcmp(optval, "zstd") == 0)
- opt->compression = BACKUP_COMPRESSION_ZSTD;
- else
+ if (!parse_bc_algorithm(optval, &opt->compression))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized compression algorithm: \"%s\"",
+ errmsg("unrecognized compression algorithm \"%s\"",
optval)));
o_compression = true;
}
- else if (strcmp(defel->defname, "compression_level") == 0)
+ else if (strcmp(defel->defname, "compression_detail") == 0)
{
- if (o_compression_level)
+ if (o_compression_detail)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->compression_level = defGetInt32(defel);
- o_compression_level = true;
+ compression_detail_str = defGetString(defel);
+ o_compression_detail = true;
}
else
ereport(ERROR,
@@ -949,10 +936,25 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_handle =
BaseBackupGetTargetHandle(target_str, target_detail_str);
- if (o_compression_level && !o_compression)
+ if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("compression level requires compression")));
+ errmsg("compression detail requires compression")));
+
+ if (o_compression)
+ {
+ char *error_detail;
+
+ parse_bc_specification(opt->compression, compression_detail_str,
+ &opt->compression_specification);
+ error_detail =
+ validate_bc_specification(&opt->compression_specification);
+ if (error_detail != NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid compression specification: %s",
+ error_detail));
+ }
}
@@ -998,11 +1000,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
- sink = bbsink_gzip_new(sink, opt.compression_level);
+ sink = bbsink_gzip_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
- sink = bbsink_lz4_new(sink, opt.compression_level);
+ sink = bbsink_lz4_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
- sink = bbsink_zstd_new(sink, opt.compression_level);
+ sink = bbsink_zstd_new(sink, &opt.compression_specification);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index b66d3da7a3..703a91ba77 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_gzip_ops = {
#endif
/*
- * Create a new basebackup sink that performs gzip compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs gzip compression.
*/
bbsink *
-bbsink_gzip_new(bbsink *next, int compresslevel)
+bbsink_gzip_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef HAVE_LIBZ
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,15 +72,14 @@ bbsink_gzip_new(bbsink *next, int compresslevel)
bbsink_gzip *sink;
Assert(next != NULL);
- Assert(compresslevel >= 0 && compresslevel <= 9);
- if (compresslevel == 0)
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
compresslevel = Z_DEFAULT_COMPRESSION;
- else if (compresslevel < 0 || compresslevel > 9)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("gzip compression level %d is out of range",
- compresslevel)));
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 9);
+ }
sink = palloc0(sizeof(bbsink_gzip));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index d838f723d0..06c161ddc4 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_lz4_ops = {
#endif
/*
- * Create a new basebackup sink that performs lz4 compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs lz4 compression.
*/
bbsink *
-bbsink_lz4_new(bbsink *next, int compresslevel)
+bbsink_lz4_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_LZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -72,11 +73,13 @@ bbsink_lz4_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 12)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("lz4 compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 12);
+ }
sink = palloc0(sizeof(bbsink_lz4));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27..96b7985693 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -55,12 +55,13 @@ const bbsink_ops bbsink_zstd_ops = {
#endif
/*
- * Create a new basebackup sink that performs zstd compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs zstd compression.
*/
bbsink *
-bbsink_zstd_new(bbsink *next, int compresslevel)
+bbsink_zstd_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_ZSTD
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,11 +72,13 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 22)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("zstd compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 22);
+ }
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index 02d4c05df6..dfa3f77af4 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -22,6 +22,7 @@
#ifndef BBSTREAMER_H
#define BBSTREAMER_H
+#include "common/backup_compression.h"
#include "lib/stringinfo.h"
#include "pqexpbuffer.h"
@@ -200,17 +201,17 @@ bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
*/
extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 894f857103..1979e95639 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -76,7 +76,8 @@ const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
* closed so that the data may be written there.
*/
bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ bc_specification *compress)
{
#ifdef HAVE_LIBZ
bbstreamer_gzip_writer *streamer;
@@ -115,11 +116,11 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
}
}
- if (gzsetparams(streamer->gzfile, compresslevel,
+ if (gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
+ compress->level, get_gz_error(streamer->gzfile));
exit(1);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index 810052e4e3..a6ec317e2b 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -67,7 +67,7 @@ const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
@@ -89,7 +89,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compresslevel;
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb..caa5edcaf1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -63,7 +63,7 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
@@ -85,7 +85,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
/* Initialize stream compression preferences */
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
+ compress->level);
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2943d9ec1a..3e6977df1a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
#include "access/xlog_internal.h"
#include "bbstreamer.h"
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -57,6 +58,7 @@ typedef struct TablespaceList
typedef struct ArchiveStreamState
{
int tablespacenum;
+ bc_specification *compress;
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
@@ -132,9 +134,6 @@ static bool checksum_failure = false;
static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
-static int compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
-static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -198,7 +197,8 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile);
+ bool expect_unterminated_tarfile,
+ bc_specification *compress);
static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
void *callback_data);
static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
@@ -207,7 +207,7 @@ static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum);
+ bool tablespacenum, bc_specification *compress);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
@@ -215,7 +215,9 @@ static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
static void ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf);
static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
void *callback_data);
-static void BaseBackup(void);
+static void BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc,
+ bc_specification *client_compress);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
bool segment_finished);
@@ -405,8 +407,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"
- " compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=[{client|server}-]METHOD[:DETAIL]\n"
+ " compress on client or server as specified\n"));
printf(_(" -Z, --compress=none do not compress tar output\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -542,7 +544,9 @@ typedef struct
} logstreamer_param;
static int
-LogStreamerMain(logstreamer_param *param)
+LogStreamerMain(logstreamer_param *param,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
StreamCtl stream;
@@ -565,25 +569,14 @@ LogStreamerMain(logstreamer_param *param)
stream.mark_done = true;
stream.partial_suffix = NULL;
stream.replication_slot = replication_slot;
-
if (format == 'p')
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
COMPRESSION_NONE, 0,
stream.do_sync);
- else if (compressloc != COMPRESS_LOCATION_CLIENT)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
- stream.do_sync);
- else if (compressmethod == COMPRESSION_GZIP)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- compressmethod,
- compresslevel,
- stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
+ wal_compress_method,
+ wal_compress_level,
stream.do_sync);
if (!ReceiveXlogStream(param->bgconn, &stream))
@@ -629,7 +622,9 @@ LogStreamerMain(logstreamer_param *param)
* stream the logfile in parallel with the backups.
*/
static void
-StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
+StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
logstreamer_param *param;
uint32 hi,
@@ -729,7 +724,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
int ret;
/* in child process */
- ret = LogStreamerMain(param);
+ ret = LogStreamerMain(param, wal_compress_method, wal_compress_level);
/* temp debugging aid to analyze 019_replslot_limit failures */
if (verbose)
@@ -1004,136 +999,81 @@ parse_max_rate(char *src)
}
/*
- * Utility wrapper to parse the values specified for -Z/--compress.
- * *methodres and *levelres will be optionally filled with values coming
- * from the parsed results.
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
+ *
+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwich, the requested algorithm is "turkey"
+ * and the detail string is "sandwich". We'll sort out whether that's legal
+ * at a later stage.
*/
static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
- CompressionLocation *locationres, int *levelres)
+parse_compress_options(char *option, char **algorithm, char **detail,
+ CompressionLocation *locationres)
{
char *sep;
- int firstlen;
- char *firstpart;
+ char *endp;
/*
- * clear 'levelres' so that if there are multiple compression options,
- * the last one fully overrides the earlier ones
- */
- *levelres = 0;
-
- /* check if the option is split in two */
- sep = strchr(src, ':');
-
- /*
- * The first part of the option value could be a method name, or just a
- * level value.
- */
- firstlen = (sep != NULL) ? (sep - src) : strlen(src);
- firstpart = pg_malloc(firstlen + 1);
- memcpy(firstpart, src, firstlen);
- firstpart[firstlen] = '\0';
-
- /*
- * Check if the first part of the string matches with a supported
- * compression method.
+ * Check whether the compression specification consists of a bare integer.
+ *
+ * If so, for backward compatibility, assume gzip.
*/
- if (pg_strcasecmp(firstpart, "gzip") == 0)
+ (void) strtol(option, &endp, 10);
+ if (*endp == '\0')
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ *algorithm = pstrdup("gzip");
+ *detail = pstrdup(option);
+ return;
}
- else if (pg_strcasecmp(firstpart, "client-gzip") == 0)
- {
- *methodres = COMPRESSION_GZIP;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-gzip") == 0)
+
+ /* Strip off any "client-" or "server-" prefix. */
+ if (strncmp(option, "server-", 7) == 0)
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
+ option += 7;
}
- else if (pg_strcasecmp(firstpart, "lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ else if (strncmp(option, "client-", 7) == 0)
{
- *methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "none") == 0)
- {
- *methodres = COMPRESSION_NONE;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ option += 7;
}
else
- {
- /*
- * It does not match anything known, so check for the
- * backward-compatible case of only an integer where the implied
- * compression method changes depending on the level value.
- */
- if (!option_parse_int(firstpart, "-Z/--compress", 0,
- INT_MAX, levelres))
- exit(1);
-
- *methodres = (*levelres > 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
- free(firstpart);
- return;
- }
-
+ /*
+ * Check whether there is a compression detail following the algorithm
+ * name.
+ */
+ sep = strchr(option, ':');
if (sep == NULL)
{
- /*
- * The caller specified a method without a colon separator, so let any
- * subsequent checks assign a default level.
- */
- free(firstpart);
- return;
+ *algorithm = pstrdup(option);
+ *detail = NULL;
}
-
- /* Check the contents after the colon separator. */
- sep++;
- if (*sep == '\0')
+ else
{
- pg_log_error("no compression level defined for method %s", firstpart);
- exit(1);
- }
+ char *alg;
- /*
- * For any of the methods currently supported, the data after the
- * separator can just be an integer.
- */
- if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
- levelres))
- exit(1);
+ alg = palloc((sep - option) + 1);
+ memcpy(alg, option, sep - option);
+ alg[sep - option] = '\0';
- free(firstpart);
+ *algorithm = alg;
+ *detail = pstrdup(sep + 1);
+ }
}
/*
@@ -1200,7 +1140,8 @@ static bbstreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile)
+ bool expect_unterminated_tarfile,
+ bc_specification *compress)
{
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
@@ -1316,32 +1257,28 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
- if (compressmethod == COMPRESSION_NONE ||
- compressloc != COMPRESS_LOCATION_CLIENT)
+ if (compress->algorithm == BACKUP_COMPRESSION_NONE)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- else if (compressmethod == COMPRESSION_GZIP)
+ else if (compress->algorithm == BACKUP_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
- archive_file,
- compresslevel);
+ archive_file, compress);
}
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (compress->algorithm == BACKUP_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_lz4_compressor_new(streamer, compress);
}
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (compress->algorithm == BACKUP_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1395,13 +1332,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ if (format == 'p')
{
- if (compressmethod == COMPRESSION_GZIP)
+ if (is_tar_gz)
streamer = bbstreamer_gzip_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (is_tar_lz4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (is_tar_zstd)
streamer = bbstreamer_zstd_decompressor_new(streamer);
}
@@ -1415,13 +1352,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* manifest if present - as a single COPY stream.
*/
static void
-ReceiveArchiveStream(PGconn *conn)
+ReceiveArchiveStream(PGconn *conn, bc_specification *compress)
{
ArchiveStreamState state;
/* Set up initial state. */
memset(&state, 0, sizeof(state));
state.tablespacenum = -1;
+ state.compress = compress;
/* All the real work happens in ReceiveArchiveStreamChunk. */
ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
@@ -1542,7 +1480,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
CreateBackupStreamer(archive_name,
spclocation,
&state->manifest_inject_streamer,
- true, false);
+ true, false,
+ state->compress);
}
break;
}
@@ -1743,7 +1682,7 @@ ReportCopyDataParseError(size_t r, char *copybuf)
*/
static void
ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum)
+ bool tablespacenum, bc_specification *compress)
{
WriteTarState state;
bbstreamer *manifest_inject_streamer;
@@ -1759,7 +1698,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
state.streamer = CreateBackupStreamer(archive_name, spclocation,
&manifest_inject_streamer,
is_recovery_guc_supported,
- expect_unterminated_tarfile);
+ expect_unterminated_tarfile,
+ compress);
state.tablespacenum = tablespacenum;
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
progress_update_filename(NULL);
@@ -1902,7 +1842,8 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
}
static void
-BaseBackup(void)
+BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc, bc_specification *client_compress)
{
PGresult *res;
char *sysidentifier;
@@ -2055,33 +1996,17 @@ BaseBackup(void)
if (compressloc == COMPRESS_LOCATION_SERVER)
{
- char *compressmethodstr = NULL;
-
if (!use_new_option_syntax)
{
pg_log_error("server does not support server-side compression");
exit(1);
}
- switch (compressmethod)
- {
- case COMPRESSION_GZIP:
- compressmethodstr = "gzip";
- break;
- case COMPRESSION_LZ4:
- compressmethodstr = "lz4";
- break;
- case COMPRESSION_ZSTD:
- compressmethodstr = "zstd";
- break;
- default:
- Assert(false);
- break;
- }
AppendStringCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION", compressmethodstr);
- if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
- AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION_LEVEL", compresslevel);
+ "COMPRESSION", compression_algorithm);
+ if (compression_detail != NULL)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_DETAIL",
+ compression_detail);
}
if (verbose)
@@ -2207,15 +2132,33 @@ BaseBackup(void)
*/
if (includewal == STREAM_WAL)
{
+ WalCompressionMethod wal_compress_method;
+ int wal_compress_level;
+
if (verbose)
pg_log_info("starting background WAL receiver");
- StartLogStreamer(xlogstart, starttli, sysidentifier);
+
+ if (client_compress->algorithm == BACKUP_COMPRESSION_GZIP)
+ {
+ wal_compress_method = COMPRESSION_GZIP;
+ wal_compress_level =
+ (client_compress->options & BACKUP_COMPRESSION_OPTION_LEVEL)
+ != 0 ? client_compress->level : 0;
+ }
+ else
+ {
+ wal_compress_method = COMPRESSION_NONE;
+ wal_compress_level = 0;
+ }
+
+ StartLogStreamer(xlogstart, starttli, sysidentifier,
+ wal_compress_method, wal_compress_level);
}
if (serverMajor >= 1500)
{
/* Receive a single tar stream with everything. */
- ReceiveArchiveStream(conn);
+ ReceiveArchiveStream(conn, client_compress);
}
else
{
@@ -2244,7 +2187,8 @@ BaseBackup(void)
spclocation = PQgetvalue(res, i, 1);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ ReceiveTarFile(conn, archive_name, spclocation, i,
+ client_compress);
}
/*
@@ -2511,6 +2455,10 @@ main(int argc, char **argv)
int c;
int option_index;
+ char *compression_algorithm = "none";
+ char *compression_detail = NULL;
+ CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
+ bc_specification client_compress;
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2616,17 +2564,13 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
- compressmethod = COMPRESSION_GZIP;
+ compression_algorithm = "gzip";
+ compression_detail = NULL;
compressloc = COMPRESS_LOCATION_UNSPECIFIED;
break;
case 'Z':
- parse_compress_options(optarg, &compressmethod,
- &compressloc, &compresslevel);
+ parse_compress_options(optarg, &compression_algorithm,
+ &compression_detail, &compressloc);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2753,12 +2697,11 @@ main(int argc, char **argv)
}
/*
- * If we're compressing the backup and the user has not said where to
- * perform the compression, do it on the client, unless they specified
- * --target, in which case the server is the only choice.
+ * If the user has not specified where to perform backup compression,
+ * default to the client, unless the user specified --target, in which case
+ * the server is the only choice.
*/
- if (compressmethod != COMPRESSION_NONE &&
- compressloc == COMPRESS_LOCATION_UNSPECIFIED)
+ if (compressloc == COMPRESS_LOCATION_UNSPECIFIED)
{
if (backup_target == NULL)
compressloc = COMPRESS_LOCATION_CLIENT;
@@ -2766,6 +2709,40 @@ main(int argc, char **argv)
compressloc = COMPRESS_LOCATION_SERVER;
}
+ /*
+ * If any compression that we're doing is happening on the client side,
+ * we must try to parse the compression algorithm and detail, but if it's
+ * all on the server side, then we're just going to pass through whatever
+ * was requested and let the server decide what to do.
+ */
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ bc_algorithm alg;
+ char *error_detail;
+
+ if (!parse_bc_algorithm(compression_algorithm, &alg))
+ {
+ pg_log_error("unrecognized compression algorithm \"%s\"",
+ compression_algorithm);
+ exit(1);
+ }
+
+ parse_bc_specification(alg, compression_detail, &client_compress);
+ error_detail = validate_bc_specification(&client_compress);
+ if (error_detail != NULL)
+ {
+ pg_log_error("invalid compression specification: %s",
+ error_detail);
+ exit(1);
+ }
+ }
+ else
+ {
+ Assert(compressloc == COMPRESS_LOCATION_SERVER);
+ client_compress.algorithm = BACKUP_COMPRESSION_NONE;
+ client_compress.options = 0;
+ }
+
/*
* Can't perform client-side compression if the backup is not being
* sent to the client.
@@ -2779,9 +2756,10 @@ main(int argc, char **argv)
}
/*
- * Compression doesn't make sense unless tar format is in use.
+ * Client-side compression doesn't make sense unless tar format is in use.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT)
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT &&
+ client_compress.algorithm != BACKUP_COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2882,56 +2860,6 @@ main(int argc, char **argv)
}
}
- /* Sanity checks for compression-related options. */
- switch (compressmethod)
- {
- case COMPRESSION_NONE:
- if (compresslevel != 0)
- {
- pg_log_error("cannot use compression level with method %s",
- "none");
- fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
- progname);
- exit(1);
- }
- break;
- case COMPRESSION_GZIP:
- if (compresslevel > 9)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 9",
- compresslevel, "gzip");
- exit(1);
- }
- if (compressloc == COMPRESS_LOCATION_CLIENT)
- {
-#ifdef HAVE_LIBZ
- if (compresslevel == 0)
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- pg_log_error("this build does not support compression with %s",
- "gzip");
- exit(1);
-#endif
- }
- break;
- case COMPRESSION_LZ4:
- if (compresslevel > 12)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 12",
- compresslevel, "lz4");
- exit(1);
- }
- break;
- case COMPRESSION_ZSTD:
- if (compresslevel > 22)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 22",
- compresslevel, "zstd");
- exit(1);
- }
- break;
- }
-
/*
* Sanity checks for progress reporting options.
*/
@@ -3040,7 +2968,8 @@ main(int argc, char **argv)
free(linkloc);
}
- BaseBackup();
+ BaseBackup(compression_algorithm, compression_detail, compressloc,
+ &client_compress);
success = true;
return 0;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index efefe947d9..2869a239e7 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -42,16 +42,12 @@ $node->command_fails(['pg_basebackup'],
# Sanity checks for options
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:1' ],
- qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure if method "none" specified with compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none+' ],
- qr/\Qpg_basebackup: error: invalid value "none+" for option/,
+ qr/\Qunrecognized compression algorithm "none+"/,
'failure on incorrect separator to define compression level');
-$node->command_fails_like(
- [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:' ],
- qr/\Qpg_basebackup: error: no compression level defined for method none/,
- 'failure on missing compression level value');
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
@@ -89,6 +85,70 @@ print $conf "wal_level = replica\n";
close $conf;
$node->restart;
+# Now that we have a server that supports replication commands, test whether
+# certain invalid compression commands fail on the client side with client-side
+# compression and on the server side with server-side compression.
+my $client_fails =
+ 'pg_basebackup: error: ';
+my $server_fails =
+ 'pg_basebackup: error: could not initiate base backup: ERROR: ';
+my @compression_failure_tests = (
+ [
+ 'extrasquishy',
+ 'unrecognized compression algorithm "extrasquishy"',
+ 'failure on invalid compression algorithm'
+ ],
+ [
+ 'gzip:',
+ 'invalid compression specification: found empty string where a compression option was expected',
+ 'failure on empty compression options list'
+ ],
+ [
+ 'gzip:thunk',
+ 'invalid compression specification: unknown compression option "thunk"',
+ 'failure on unknown compression option'
+ ],
+ [
+ 'gzip:level',
+ 'invalid compression specification: compression option "level" requires a value',
+ 'failure on missing compression level'
+ ],
+ [
+ 'gzip:level=',
+ 'invalid compression specification: value for compression option "level" must be an integer',
+ 'failure on empty compression level'
+ ],
+ [
+ 'gzip:level=high',
+ 'invalid compression specification: value for compression option "level" must be an integer',
+ 'failure on non-numeric compression level'
+ ],
+ [
+ 'gzip:level=236',
+ 'invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9',
+ 'failure on out-of-range compression level'
+ ],
+ [
+ 'gzip:level=9,',
+ 'invalid compression specification: found empty string where a compression option was expected',
+ 'failure on extra, empty compression option'
+ ],
+);
+for my $cft (@compression_failure_tests)
+{
+ my $cfail = quotemeta($client_fails . $cft->[1]);
+ my $sfail = quotemeta($server_fails . $cft->[1]);
+ $node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', $cft->[0] ],
+ qr/$cfail/,
+ 'client '. $cft->[2]);
+ $node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress',
+ 'server-' . $cft->[0] ],
+ qr/$sfail/,
+ 'server ' . $cft->[2]);
+}
+
# Write some files to test that they are not copied.
foreach my $filename (
qw(backup_label tablespace_map postgresql.auto.conf.tmp
diff --git a/src/common/Makefile b/src/common/Makefile
index 31c0dd366d..f627349835 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -47,6 +47,7 @@ LIBS += $(PTHREAD_LIBS)
OBJS_COMMON = \
archive.o \
+ backup_compression.o \
base64.o \
checksum_helper.o \
config_info.o \
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
new file mode 100644
index 0000000000..46fa766130
--- /dev/null
+++ b/src/common/backup_compression.c
@@ -0,0 +1,269 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.c
+ *
+ * Shared code for backup compression methods and specifications.
+ *
+ * A compression specification specifies the parameters that should be used
+ * when performing compression with a specific algorithm. The simplest
+ * possible compression specification is an integer, which sets the
+ * compression level.
+ *
+ * Otherwise, a compression specification is a comma-separated list of items,
+ * each having the form keyword or keyword=value.
+ *
+ * Currently, the only supported keyword is "level".
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.c
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/backup_compression.h"
+
+static int expect_integer_value(char *keyword, char *value,
+ bc_specification *result);
+
+/*
+ * Look up a compression algorithm by name. Returns true and sets *algorithm
+ * if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_bc_algorithm(char *name, bc_algorithm *algorithm)
+{
+ if (strcmp(name, "none") == 0)
+ *algorithm = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(name, "gzip") == 0)
+ *algorithm = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(name, "lz4") == 0)
+ *algorithm = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(name, "zstd") == 0)
+ *algorithm = BACKUP_COMPRESSION_ZSTD;
+ else
+ return false;
+ return true;
+}
+
+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+const char *
+get_bc_algorithm_name(bc_algorithm algorithm)
+{
+ switch (algorithm)
+ {
+ case BACKUP_COMPRESSION_NONE:
+ return "none";
+ case BACKUP_COMPRESSION_GZIP:
+ return "gzip";
+ case BACKUP_COMPRESSION_LZ4:
+ return "lz4";
+ case BACKUP_COMPRESSION_ZSTD:
+ return "zstd";
+ /* no default, to provoke compiler warnings if values are added */
+ }
+ Assert(false);
+}
+
+/*
+ * Parse a compression specification for a specified algorithm.
+ *
+ * See the file header comments for a brief description of what a compression
+ * specification is expected to look like.
+ *
+ * On return, all fields of the result object will be initialized.
+ * In particular, result->parse_error will be NULL if no errors occurred
+ * during parsing, and will otherwise contain a an appropriate error message.
+ * The caller may free this error message string using pfree, if desired.
+ * Note, however, even if there's no parse error, the string might not make
+ * sense: e.g. for gzip, level=12 is not sensible, but it does parse OK.
+ *
+ * Use validate_bc_specification() to find out whether a compression
+ * specification is semantically sensible.
+ */
+void
+parse_bc_specification(bc_algorithm algorithm, char *specification,
+ bc_specification *result)
+{
+ int bare_level;
+ char *bare_level_endp;
+
+ /* Initial setup of result object. */
+ result->algorithm = algorithm;
+ result->options = 0;
+ result->level = -1;
+ result->parse_error = NULL;
+
+ /* If there is no specification, we're done already. */
+ if (specification == NULL)
+ return;
+
+ /* As a special case, the specification can be a bare integer. */
+ bare_level = strtol(specification, &bare_level_endp, 10);
+ if (specification != bare_level_endp && *bare_level_endp == '\0')
+ {
+ result->level = bare_level;
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ return;
+ }
+
+ /* Look for comma-separated keyword or keyword=value entries. */
+ while (1)
+ {
+ char *kwstart;
+ char *kwend;
+ char *vstart;
+ char *vend;
+ int kwlen;
+ int vlen;
+ bool has_value;
+ char *keyword;
+ char *value;
+
+ /* Figure start, end, and length of next keyword and any value. */
+ kwstart = kwend = specification;
+ while (*kwend != '\0' && *kwend != ',' && *kwend != '=')
+ ++kwend;
+ kwlen = kwend - kwstart;
+ if (*kwend != '=')
+ {
+ vstart = vend = NULL;
+ vlen = 0;
+ has_value = false;
+ }
+ else
+ {
+ vstart = vend = kwend + 1;
+ while (*vend != '\0' && *vend != ',')
+ ++vend;
+ vlen = vend - vstart;
+ has_value = true;
+ }
+
+ /* Reject empty keyword. */
+ if (kwlen == 0)
+ {
+ result->parse_error =
+ pstrdup(_("found empty string where a compression option was expected"));
+ break;
+ }
+
+ /* Extract keyword and value as separate C strings. */
+ keyword = palloc(kwlen + 1);
+ memcpy(keyword, kwstart, kwlen);
+ keyword[kwlen] = '\0';
+ if (!has_value)
+ value = NULL;
+ else
+ {
+ value = palloc(vlen + 1);
+ memcpy(value, vstart, vlen);
+ value[vlen] = '\0';
+ }
+
+ /* Handle whatever keyword we found. */
+ if (strcmp(keyword, "level") == 0)
+ {
+ result->level = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ }
+ else
+ result->parse_error =
+ psprintf(_("unknown compression option \"%s\""), keyword);
+
+ /* Release memory, just to be tidy. */
+ pfree(keyword);
+ if (value != NULL)
+ pfree(value);
+
+ /* If we got an error or have reached the end of the string, stop. */
+ if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ break;
+
+ /* Advance to next entry and loop around. */
+ specification = vend == NULL ? kwend + 1 : vend + 1;
+ }
+}
+
+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)
+{
+ int ivalue;
+ char *ivalue_endp;
+
+ if (value == NULL)
+ {
+ result->parse_error =
+ psprintf(_("compression option \"%s\" requires a value"),
+ keyword);
+ return -1;
+ }
+
+ ivalue = strtol(value, &ivalue_endp, 10);
+ if (ivalue_endp == value || *ivalue_endp != '\0')
+ {
+ result->parse_error =
+ psprintf(_("value for compression option \"%s\" must be an integer"),
+ keyword);
+ return -1;
+ }
+ return ivalue;
+}
+
+/*
+ * Returns NULL if the compression specification string was syntactically
+ * valid and semantically sensible. Otherwise, returns an error message.
+ *
+ * Does not test whether this build of PostgreSQL supports the requested
+ * compression method.
+ */
+char *
+validate_bc_specification(bc_specification *spec)
+{
+ /* If it didn't even parse OK, it's definitely no good. */
+ if (spec->parse_error != NULL)
+ return spec->parse_error;
+
+ /*
+ * If a compression level was specified, check that the algorithm expects
+ * a compression level and that the level is within the legal range for
+ * the algorithm.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ int min_level = 1;
+ int max_level;
+
+ if (spec->algorithm == BACKUP_COMPRESSION_GZIP)
+ max_level = 9;
+ else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
+ max_level = 12;
+ else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ max_level = 22;
+ else
+ return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
+ get_bc_algorithm_name(spec->algorithm));
+
+ if (spec->level < min_level || spec->level > max_level)
+ return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+ get_bc_algorithm_name(spec->algorithm),
+ min_level, max_level);
+ }
+
+ return NULL;
+}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
new file mode 100644
index 0000000000..0565cbc657
--- /dev/null
+++ b/src/include/common/backup_compression.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.h
+ *
+ * Shared definitions for backup compression methods and specifications.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BACKUP_COMPRESSION_H
+#define BACKUP_COMPRESSION_H
+
+typedef enum bc_algorithm
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
+} bc_algorithm;
+
+#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+
+typedef struct bc_specification
+{
+ bc_algorithm algorithm;
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
+ int level;
+ char *parse_error; /* NULL if parsing was OK, else message */
+} bc_specification;
+
+extern bool parse_bc_algorithm(char *name, bc_algorithm *algorithm);
+extern const char *get_bc_algorithm_name(bc_algorithm algorithm);
+
+extern void parse_bc_specification(bc_algorithm algorithm,
+ char *specification,
+ bc_specification *result);
+
+extern char *validate_bc_specification(bc_specification *);
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a7f16758a4..654df28576 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -27,6 +27,7 @@
#define BASEBACKUP_SINK_H
#include "access/xlog_internal.h"
+#include "common/backup_compression.h"
#include "nodes/pg_list.h"
/* Forward declarations. */
@@ -283,9 +284,9 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
-extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_gzip_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_lz4_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_zstd_new(bbsink *next, bc_specification *);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 441d6ae6bf..de8676d339 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -124,7 +124,7 @@ sub mkvcbuild
}
our @pgcommonallfiles = qw(
- archive.c base64.c checksum_helper.c
+ archive.c backup_compression.c base64.c checksum_helper.c
config_info.c controldata_utils.c d2s.c encnames.c exec.c
f2s.c file_perm.c file_utils.c hashfn.c ip.c jsonapi.c
keywords.c kwlookup.c link-canary.c md5_common.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaf3e7a8d4..01748cac07 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3788,3 +3788,5 @@ yyscan_t
z_stream
z_streamp
zic_t
+bc_algorithm
+bc_specification
--
2.24.3 (Apple Git-128)
On Mon, Mar 21, 2022 at 12:57:36PM -0400, Robert Haas wrote:
typo: contain a an
I searched for the "contain a an" typo that you mentioned but was not able to
find it. Can you give me a more specific pointer?
Here:
+ * during parsing, and will otherwise contain a an appropriate error message.
I looked a little bit more at the compression method vs. compression
algorithm thing. I agree that there is some inconsistency in
terminology here, but I'm still not sure that we are well-served by
trying to make it totally uniform, especially if we pick the word
"method" as the standard rather than "algorithm". In my opinion,
"method" is less specific than "algorithm". If someone asks me to
choose a compression algorithm, I know that I should give an answer
like "lz4" or "zstd". If they ask me to pick a compression method, I'm
not quite sure whether they want that kind of answer or whether they
want something more detailed, like "use lz4 with compression level 3
and a 1MB block size". After all, that is (at least according to my
understanding of how English works) a perfectly valid answer to the
question "what method should I use to compress this data?" -- but not
to the question "what algorithm should I use to compress this data?".
The latter can ONLY be properly answered by saying something like
"lz4". And I think that's really the root of my hesitation to make the
kinds of changes you want here.
I think "algorithm" could be much more nuanced than "lz4", but I also think
we've spent more than enough time on it now :)
--
Justin
On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
+ * during parsing, and will otherwise contain a an appropriate error message.
OK, thanks. v4 attached.
I think "algorithm" could be much more nuanced than "lz4", but I also think
we've spent more than enough time on it now :)
Oh dear. But yes.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v4-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchapplication/octet-stream; name=v4-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchDownload
From 2481134c810e8f3526852544868e5c551ccd4ee5 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 21 Mar 2022 14:24:15 -0400
Subject: [PATCH v4] Replace BASE_BACKUP COMPRESSION_LEVEL option with
COMPRESSION_DETAIL.
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby.
---
doc/src/sgml/protocol.sgml | 18 +-
doc/src/sgml/ref/pg_basebackup.sgml | 25 +-
src/backend/replication/basebackup.c | 62 +--
src/backend/replication/basebackup_gzip.c | 20 +-
src/backend/replication/basebackup_lz4.c | 19 +-
src/backend/replication/basebackup_zstd.c | 19 +-
src/bin/pg_basebackup/bbstreamer.h | 7 +-
src/bin/pg_basebackup/bbstreamer_gzip.c | 7 +-
src/bin/pg_basebackup/bbstreamer_lz4.c | 4 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 4 +-
src/bin/pg_basebackup/pg_basebackup.c | 409 ++++++++-----------
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 72 +++-
src/common/Makefile | 1 +
src/common/backup_compression.c | 269 ++++++++++++
src/include/common/backup_compression.h | 44 ++
src/include/replication/basebackup_sink.h | 7 +-
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/pgindent/typedefs.list | 2 +
18 files changed, 662 insertions(+), 329 deletions(-)
create mode 100644 src/common/backup_compression.c
create mode 100644 src/include/common/backup_compression.h
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term>
+ <term><literal>COMPRESSION_DETAIL</literal> <replaceable>detail</replaceable></term>
<listitem>
<para>
Specifies the compression level to be used. This should only be
used in conjunction with the <literal>COMPRESSION</literal> option.
- For <literal>gzip</literal> the value should be an integer between 1
- and 9, for <literal>lz4</literal> between 1 and 12, and for
- <literal>zstd</literal> it should be between 1 and 22.
+ If the value is an integer, it specifies the compression level.
+ Otherwise, it should be a comma-separated list of items, each of
+ the form <literal>keyword</literal> or
+ <literal>keyword=value</literal>. Currently, the only supported
+ keyword is <literal>level</literal>, which sets the compression
+ level.
+ </para>
+
+ <para>
+ For <literal>gzip</literal> the compression level should be an
+ integer between 1 and 9, for <literal>lz4</literal> an integer
+ between 1 and 12, and for <literal>zstd</literal> an integer
+ between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 4a630b59b7..46d7f15e54 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -399,9 +399,9 @@ PostgreSQL documentation
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
- <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>detail</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[{client|server}-]<replaceable class="parameter">method[:<replaceable>detail</replaceable>]</replaceable></option></term>
<listitem>
<para>
Requests compression of the backup. If <literal>client</literal> or
@@ -419,13 +419,20 @@ PostgreSQL documentation
<para>
The compression method can be set to <literal>gzip</literal>,
<literal>lz4</literal>, <literal>zstd</literal>, or
- <literal>none</literal> for no compression. A compression level can
- optionally be specified, by appending the level number after a colon
- (<literal>:</literal>). If no level is specified, the default
- compression level will be used. If only a level is specified without
- mentioning an algorithm, <literal>gzip</literal> compression will be
- used if the level is greater than 0, and no compression will be used if
- the level is 0.
+ <literal>none</literal> for no compression. A compression detail
+ string can optionally be specified. If the detail string is an
+ integer, it specifies the compression level. Otherwise, it should be
+ a comma-separated list of items, each of the form
+ <literal>keyword</literal> or <literal>keyword=value</literal>.
+ Currently, the only supported keyword is <literal>level</literal>,
+ which sets the compression level.
+ </para>
+ <para>
+ If no compression level is specified, the default compression level
+ will be used. If only a level is specified without mentioning an
+ algorithm, <literal>gzip</literal> compression will be used if the
+ level is greater than 0, and no compression will be used if the level
+ is 0.
</para>
<para>
When the tar format is used with <literal>gzip</literal>,
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c2aedc14a2..49deead091 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,6 +17,7 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
#include "lib/stringinfo.h"
@@ -54,14 +55,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4,
- BACKUP_COMPRESSION_ZSTD
-} basebackup_compression_type;
-
typedef struct
{
const char *label;
@@ -75,8 +68,8 @@ typedef struct
bool use_copytblspc;
BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
- basebackup_compression_type compression;
- int compression_level;
+ bc_algorithm compression;
+ bc_specification compression_specification;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -713,12 +706,14 @@ parse_basebackup_options(List *options, basebackup_options *opt)
char *target_str = NULL;
char *target_detail_str = NULL;
bool o_compression = false;
- bool o_compression_level = false;
+ bool o_compression_detail = false;
+ char *compression_detail_str = NULL;
MemSet(opt, 0, sizeof(*opt));
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
+ opt->compression_specification.algorithm = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -885,29 +880,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "none") == 0)
- opt->compression = BACKUP_COMPRESSION_NONE;
- else if (strcmp(optval, "gzip") == 0)
- opt->compression = BACKUP_COMPRESSION_GZIP;
- else if (strcmp(optval, "lz4") == 0)
- opt->compression = BACKUP_COMPRESSION_LZ4;
- else if (strcmp(optval, "zstd") == 0)
- opt->compression = BACKUP_COMPRESSION_ZSTD;
- else
+ if (!parse_bc_algorithm(optval, &opt->compression))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized compression algorithm: \"%s\"",
+ errmsg("unrecognized compression algorithm \"%s\"",
optval)));
o_compression = true;
}
- else if (strcmp(defel->defname, "compression_level") == 0)
+ else if (strcmp(defel->defname, "compression_detail") == 0)
{
- if (o_compression_level)
+ if (o_compression_detail)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->compression_level = defGetInt32(defel);
- o_compression_level = true;
+ compression_detail_str = defGetString(defel);
+ o_compression_detail = true;
}
else
ereport(ERROR,
@@ -949,10 +936,25 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_handle =
BaseBackupGetTargetHandle(target_str, target_detail_str);
- if (o_compression_level && !o_compression)
+ if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("compression level requires compression")));
+ errmsg("compression detail requires compression")));
+
+ if (o_compression)
+ {
+ char *error_detail;
+
+ parse_bc_specification(opt->compression, compression_detail_str,
+ &opt->compression_specification);
+ error_detail =
+ validate_bc_specification(&opt->compression_specification);
+ if (error_detail != NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid compression specification: %s",
+ error_detail));
+ }
}
@@ -998,11 +1000,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
- sink = bbsink_gzip_new(sink, opt.compression_level);
+ sink = bbsink_gzip_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
- sink = bbsink_lz4_new(sink, opt.compression_level);
+ sink = bbsink_lz4_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
- sink = bbsink_zstd_new(sink, opt.compression_level);
+ sink = bbsink_zstd_new(sink, &opt.compression_specification);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index b66d3da7a3..703a91ba77 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_gzip_ops = {
#endif
/*
- * Create a new basebackup sink that performs gzip compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs gzip compression.
*/
bbsink *
-bbsink_gzip_new(bbsink *next, int compresslevel)
+bbsink_gzip_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef HAVE_LIBZ
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,15 +72,14 @@ bbsink_gzip_new(bbsink *next, int compresslevel)
bbsink_gzip *sink;
Assert(next != NULL);
- Assert(compresslevel >= 0 && compresslevel <= 9);
- if (compresslevel == 0)
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
compresslevel = Z_DEFAULT_COMPRESSION;
- else if (compresslevel < 0 || compresslevel > 9)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("gzip compression level %d is out of range",
- compresslevel)));
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 9);
+ }
sink = palloc0(sizeof(bbsink_gzip));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index d838f723d0..06c161ddc4 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_lz4_ops = {
#endif
/*
- * Create a new basebackup sink that performs lz4 compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs lz4 compression.
*/
bbsink *
-bbsink_lz4_new(bbsink *next, int compresslevel)
+bbsink_lz4_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_LZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -72,11 +73,13 @@ bbsink_lz4_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 12)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("lz4 compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 12);
+ }
sink = palloc0(sizeof(bbsink_lz4));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27..96b7985693 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -55,12 +55,13 @@ const bbsink_ops bbsink_zstd_ops = {
#endif
/*
- * Create a new basebackup sink that performs zstd compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs zstd compression.
*/
bbsink *
-bbsink_zstd_new(bbsink *next, int compresslevel)
+bbsink_zstd_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_ZSTD
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,11 +72,13 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 22)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("zstd compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 22);
+ }
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index 02d4c05df6..dfa3f77af4 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -22,6 +22,7 @@
#ifndef BBSTREAMER_H
#define BBSTREAMER_H
+#include "common/backup_compression.h"
#include "lib/stringinfo.h"
#include "pqexpbuffer.h"
@@ -200,17 +201,17 @@ bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
*/
extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 894f857103..1979e95639 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -76,7 +76,8 @@ const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
* closed so that the data may be written there.
*/
bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ bc_specification *compress)
{
#ifdef HAVE_LIBZ
bbstreamer_gzip_writer *streamer;
@@ -115,11 +116,11 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
}
}
- if (gzsetparams(streamer->gzfile, compresslevel,
+ if (gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
+ compress->level, get_gz_error(streamer->gzfile));
exit(1);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index 810052e4e3..a6ec317e2b 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -67,7 +67,7 @@ const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
@@ -89,7 +89,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compresslevel;
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb..caa5edcaf1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -63,7 +63,7 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
@@ -85,7 +85,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
/* Initialize stream compression preferences */
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
+ compress->level);
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2943d9ec1a..3e6977df1a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
#include "access/xlog_internal.h"
#include "bbstreamer.h"
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -57,6 +58,7 @@ typedef struct TablespaceList
typedef struct ArchiveStreamState
{
int tablespacenum;
+ bc_specification *compress;
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
@@ -132,9 +134,6 @@ static bool checksum_failure = false;
static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
-static int compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
-static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -198,7 +197,8 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile);
+ bool expect_unterminated_tarfile,
+ bc_specification *compress);
static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
void *callback_data);
static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
@@ -207,7 +207,7 @@ static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum);
+ bool tablespacenum, bc_specification *compress);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
@@ -215,7 +215,9 @@ static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
static void ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf);
static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
void *callback_data);
-static void BaseBackup(void);
+static void BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc,
+ bc_specification *client_compress);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
bool segment_finished);
@@ -405,8 +407,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"
- " compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=[{client|server}-]METHOD[:DETAIL]\n"
+ " compress on client or server as specified\n"));
printf(_(" -Z, --compress=none do not compress tar output\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -542,7 +544,9 @@ typedef struct
} logstreamer_param;
static int
-LogStreamerMain(logstreamer_param *param)
+LogStreamerMain(logstreamer_param *param,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
StreamCtl stream;
@@ -565,25 +569,14 @@ LogStreamerMain(logstreamer_param *param)
stream.mark_done = true;
stream.partial_suffix = NULL;
stream.replication_slot = replication_slot;
-
if (format == 'p')
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
COMPRESSION_NONE, 0,
stream.do_sync);
- else if (compressloc != COMPRESS_LOCATION_CLIENT)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
- stream.do_sync);
- else if (compressmethod == COMPRESSION_GZIP)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- compressmethod,
- compresslevel,
- stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
+ wal_compress_method,
+ wal_compress_level,
stream.do_sync);
if (!ReceiveXlogStream(param->bgconn, &stream))
@@ -629,7 +622,9 @@ LogStreamerMain(logstreamer_param *param)
* stream the logfile in parallel with the backups.
*/
static void
-StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
+StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
logstreamer_param *param;
uint32 hi,
@@ -729,7 +724,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
int ret;
/* in child process */
- ret = LogStreamerMain(param);
+ ret = LogStreamerMain(param, wal_compress_method, wal_compress_level);
/* temp debugging aid to analyze 019_replslot_limit failures */
if (verbose)
@@ -1004,136 +999,81 @@ parse_max_rate(char *src)
}
/*
- * Utility wrapper to parse the values specified for -Z/--compress.
- * *methodres and *levelres will be optionally filled with values coming
- * from the parsed results.
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
+ *
+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwich, the requested algorithm is "turkey"
+ * and the detail string is "sandwich". We'll sort out whether that's legal
+ * at a later stage.
*/
static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
- CompressionLocation *locationres, int *levelres)
+parse_compress_options(char *option, char **algorithm, char **detail,
+ CompressionLocation *locationres)
{
char *sep;
- int firstlen;
- char *firstpart;
+ char *endp;
/*
- * clear 'levelres' so that if there are multiple compression options,
- * the last one fully overrides the earlier ones
- */
- *levelres = 0;
-
- /* check if the option is split in two */
- sep = strchr(src, ':');
-
- /*
- * The first part of the option value could be a method name, or just a
- * level value.
- */
- firstlen = (sep != NULL) ? (sep - src) : strlen(src);
- firstpart = pg_malloc(firstlen + 1);
- memcpy(firstpart, src, firstlen);
- firstpart[firstlen] = '\0';
-
- /*
- * Check if the first part of the string matches with a supported
- * compression method.
+ * Check whether the compression specification consists of a bare integer.
+ *
+ * If so, for backward compatibility, assume gzip.
*/
- if (pg_strcasecmp(firstpart, "gzip") == 0)
+ (void) strtol(option, &endp, 10);
+ if (*endp == '\0')
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ *algorithm = pstrdup("gzip");
+ *detail = pstrdup(option);
+ return;
}
- else if (pg_strcasecmp(firstpart, "client-gzip") == 0)
- {
- *methodres = COMPRESSION_GZIP;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-gzip") == 0)
+
+ /* Strip off any "client-" or "server-" prefix. */
+ if (strncmp(option, "server-", 7) == 0)
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
+ option += 7;
}
- else if (pg_strcasecmp(firstpart, "lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ else if (strncmp(option, "client-", 7) == 0)
{
- *methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "none") == 0)
- {
- *methodres = COMPRESSION_NONE;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ option += 7;
}
else
- {
- /*
- * It does not match anything known, so check for the
- * backward-compatible case of only an integer where the implied
- * compression method changes depending on the level value.
- */
- if (!option_parse_int(firstpart, "-Z/--compress", 0,
- INT_MAX, levelres))
- exit(1);
-
- *methodres = (*levelres > 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
- free(firstpart);
- return;
- }
-
+ /*
+ * Check whether there is a compression detail following the algorithm
+ * name.
+ */
+ sep = strchr(option, ':');
if (sep == NULL)
{
- /*
- * The caller specified a method without a colon separator, so let any
- * subsequent checks assign a default level.
- */
- free(firstpart);
- return;
+ *algorithm = pstrdup(option);
+ *detail = NULL;
}
-
- /* Check the contents after the colon separator. */
- sep++;
- if (*sep == '\0')
+ else
{
- pg_log_error("no compression level defined for method %s", firstpart);
- exit(1);
- }
+ char *alg;
- /*
- * For any of the methods currently supported, the data after the
- * separator can just be an integer.
- */
- if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
- levelres))
- exit(1);
+ alg = palloc((sep - option) + 1);
+ memcpy(alg, option, sep - option);
+ alg[sep - option] = '\0';
- free(firstpart);
+ *algorithm = alg;
+ *detail = pstrdup(sep + 1);
+ }
}
/*
@@ -1200,7 +1140,8 @@ static bbstreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile)
+ bool expect_unterminated_tarfile,
+ bc_specification *compress)
{
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
@@ -1316,32 +1257,28 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
- if (compressmethod == COMPRESSION_NONE ||
- compressloc != COMPRESS_LOCATION_CLIENT)
+ if (compress->algorithm == BACKUP_COMPRESSION_NONE)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- else if (compressmethod == COMPRESSION_GZIP)
+ else if (compress->algorithm == BACKUP_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
- archive_file,
- compresslevel);
+ archive_file, compress);
}
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (compress->algorithm == BACKUP_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_lz4_compressor_new(streamer, compress);
}
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (compress->algorithm == BACKUP_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1395,13 +1332,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ if (format == 'p')
{
- if (compressmethod == COMPRESSION_GZIP)
+ if (is_tar_gz)
streamer = bbstreamer_gzip_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (is_tar_lz4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (is_tar_zstd)
streamer = bbstreamer_zstd_decompressor_new(streamer);
}
@@ -1415,13 +1352,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* manifest if present - as a single COPY stream.
*/
static void
-ReceiveArchiveStream(PGconn *conn)
+ReceiveArchiveStream(PGconn *conn, bc_specification *compress)
{
ArchiveStreamState state;
/* Set up initial state. */
memset(&state, 0, sizeof(state));
state.tablespacenum = -1;
+ state.compress = compress;
/* All the real work happens in ReceiveArchiveStreamChunk. */
ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
@@ -1542,7 +1480,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
CreateBackupStreamer(archive_name,
spclocation,
&state->manifest_inject_streamer,
- true, false);
+ true, false,
+ state->compress);
}
break;
}
@@ -1743,7 +1682,7 @@ ReportCopyDataParseError(size_t r, char *copybuf)
*/
static void
ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum)
+ bool tablespacenum, bc_specification *compress)
{
WriteTarState state;
bbstreamer *manifest_inject_streamer;
@@ -1759,7 +1698,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
state.streamer = CreateBackupStreamer(archive_name, spclocation,
&manifest_inject_streamer,
is_recovery_guc_supported,
- expect_unterminated_tarfile);
+ expect_unterminated_tarfile,
+ compress);
state.tablespacenum = tablespacenum;
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
progress_update_filename(NULL);
@@ -1902,7 +1842,8 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
}
static void
-BaseBackup(void)
+BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc, bc_specification *client_compress)
{
PGresult *res;
char *sysidentifier;
@@ -2055,33 +1996,17 @@ BaseBackup(void)
if (compressloc == COMPRESS_LOCATION_SERVER)
{
- char *compressmethodstr = NULL;
-
if (!use_new_option_syntax)
{
pg_log_error("server does not support server-side compression");
exit(1);
}
- switch (compressmethod)
- {
- case COMPRESSION_GZIP:
- compressmethodstr = "gzip";
- break;
- case COMPRESSION_LZ4:
- compressmethodstr = "lz4";
- break;
- case COMPRESSION_ZSTD:
- compressmethodstr = "zstd";
- break;
- default:
- Assert(false);
- break;
- }
AppendStringCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION", compressmethodstr);
- if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
- AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION_LEVEL", compresslevel);
+ "COMPRESSION", compression_algorithm);
+ if (compression_detail != NULL)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_DETAIL",
+ compression_detail);
}
if (verbose)
@@ -2207,15 +2132,33 @@ BaseBackup(void)
*/
if (includewal == STREAM_WAL)
{
+ WalCompressionMethod wal_compress_method;
+ int wal_compress_level;
+
if (verbose)
pg_log_info("starting background WAL receiver");
- StartLogStreamer(xlogstart, starttli, sysidentifier);
+
+ if (client_compress->algorithm == BACKUP_COMPRESSION_GZIP)
+ {
+ wal_compress_method = COMPRESSION_GZIP;
+ wal_compress_level =
+ (client_compress->options & BACKUP_COMPRESSION_OPTION_LEVEL)
+ != 0 ? client_compress->level : 0;
+ }
+ else
+ {
+ wal_compress_method = COMPRESSION_NONE;
+ wal_compress_level = 0;
+ }
+
+ StartLogStreamer(xlogstart, starttli, sysidentifier,
+ wal_compress_method, wal_compress_level);
}
if (serverMajor >= 1500)
{
/* Receive a single tar stream with everything. */
- ReceiveArchiveStream(conn);
+ ReceiveArchiveStream(conn, client_compress);
}
else
{
@@ -2244,7 +2187,8 @@ BaseBackup(void)
spclocation = PQgetvalue(res, i, 1);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ ReceiveTarFile(conn, archive_name, spclocation, i,
+ client_compress);
}
/*
@@ -2511,6 +2455,10 @@ main(int argc, char **argv)
int c;
int option_index;
+ char *compression_algorithm = "none";
+ char *compression_detail = NULL;
+ CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
+ bc_specification client_compress;
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2616,17 +2564,13 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
- compressmethod = COMPRESSION_GZIP;
+ compression_algorithm = "gzip";
+ compression_detail = NULL;
compressloc = COMPRESS_LOCATION_UNSPECIFIED;
break;
case 'Z':
- parse_compress_options(optarg, &compressmethod,
- &compressloc, &compresslevel);
+ parse_compress_options(optarg, &compression_algorithm,
+ &compression_detail, &compressloc);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2753,12 +2697,11 @@ main(int argc, char **argv)
}
/*
- * If we're compressing the backup and the user has not said where to
- * perform the compression, do it on the client, unless they specified
- * --target, in which case the server is the only choice.
+ * If the user has not specified where to perform backup compression,
+ * default to the client, unless the user specified --target, in which case
+ * the server is the only choice.
*/
- if (compressmethod != COMPRESSION_NONE &&
- compressloc == COMPRESS_LOCATION_UNSPECIFIED)
+ if (compressloc == COMPRESS_LOCATION_UNSPECIFIED)
{
if (backup_target == NULL)
compressloc = COMPRESS_LOCATION_CLIENT;
@@ -2766,6 +2709,40 @@ main(int argc, char **argv)
compressloc = COMPRESS_LOCATION_SERVER;
}
+ /*
+ * If any compression that we're doing is happening on the client side,
+ * we must try to parse the compression algorithm and detail, but if it's
+ * all on the server side, then we're just going to pass through whatever
+ * was requested and let the server decide what to do.
+ */
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ bc_algorithm alg;
+ char *error_detail;
+
+ if (!parse_bc_algorithm(compression_algorithm, &alg))
+ {
+ pg_log_error("unrecognized compression algorithm \"%s\"",
+ compression_algorithm);
+ exit(1);
+ }
+
+ parse_bc_specification(alg, compression_detail, &client_compress);
+ error_detail = validate_bc_specification(&client_compress);
+ if (error_detail != NULL)
+ {
+ pg_log_error("invalid compression specification: %s",
+ error_detail);
+ exit(1);
+ }
+ }
+ else
+ {
+ Assert(compressloc == COMPRESS_LOCATION_SERVER);
+ client_compress.algorithm = BACKUP_COMPRESSION_NONE;
+ client_compress.options = 0;
+ }
+
/*
* Can't perform client-side compression if the backup is not being
* sent to the client.
@@ -2779,9 +2756,10 @@ main(int argc, char **argv)
}
/*
- * Compression doesn't make sense unless tar format is in use.
+ * Client-side compression doesn't make sense unless tar format is in use.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT)
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT &&
+ client_compress.algorithm != BACKUP_COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2882,56 +2860,6 @@ main(int argc, char **argv)
}
}
- /* Sanity checks for compression-related options. */
- switch (compressmethod)
- {
- case COMPRESSION_NONE:
- if (compresslevel != 0)
- {
- pg_log_error("cannot use compression level with method %s",
- "none");
- fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
- progname);
- exit(1);
- }
- break;
- case COMPRESSION_GZIP:
- if (compresslevel > 9)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 9",
- compresslevel, "gzip");
- exit(1);
- }
- if (compressloc == COMPRESS_LOCATION_CLIENT)
- {
-#ifdef HAVE_LIBZ
- if (compresslevel == 0)
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- pg_log_error("this build does not support compression with %s",
- "gzip");
- exit(1);
-#endif
- }
- break;
- case COMPRESSION_LZ4:
- if (compresslevel > 12)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 12",
- compresslevel, "lz4");
- exit(1);
- }
- break;
- case COMPRESSION_ZSTD:
- if (compresslevel > 22)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 22",
- compresslevel, "zstd");
- exit(1);
- }
- break;
- }
-
/*
* Sanity checks for progress reporting options.
*/
@@ -3040,7 +2968,8 @@ main(int argc, char **argv)
free(linkloc);
}
- BaseBackup();
+ BaseBackup(compression_algorithm, compression_detail, compressloc,
+ &client_compress);
success = true;
return 0;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index efefe947d9..2869a239e7 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -42,16 +42,12 @@ $node->command_fails(['pg_basebackup'],
# Sanity checks for options
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:1' ],
- qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure if method "none" specified with compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none+' ],
- qr/\Qpg_basebackup: error: invalid value "none+" for option/,
+ qr/\Qunrecognized compression algorithm "none+"/,
'failure on incorrect separator to define compression level');
-$node->command_fails_like(
- [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:' ],
- qr/\Qpg_basebackup: error: no compression level defined for method none/,
- 'failure on missing compression level value');
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
@@ -89,6 +85,70 @@ print $conf "wal_level = replica\n";
close $conf;
$node->restart;
+# Now that we have a server that supports replication commands, test whether
+# certain invalid compression commands fail on the client side with client-side
+# compression and on the server side with server-side compression.
+my $client_fails =
+ 'pg_basebackup: error: ';
+my $server_fails =
+ 'pg_basebackup: error: could not initiate base backup: ERROR: ';
+my @compression_failure_tests = (
+ [
+ 'extrasquishy',
+ 'unrecognized compression algorithm "extrasquishy"',
+ 'failure on invalid compression algorithm'
+ ],
+ [
+ 'gzip:',
+ 'invalid compression specification: found empty string where a compression option was expected',
+ 'failure on empty compression options list'
+ ],
+ [
+ 'gzip:thunk',
+ 'invalid compression specification: unknown compression option "thunk"',
+ 'failure on unknown compression option'
+ ],
+ [
+ 'gzip:level',
+ 'invalid compression specification: compression option "level" requires a value',
+ 'failure on missing compression level'
+ ],
+ [
+ 'gzip:level=',
+ 'invalid compression specification: value for compression option "level" must be an integer',
+ 'failure on empty compression level'
+ ],
+ [
+ 'gzip:level=high',
+ 'invalid compression specification: value for compression option "level" must be an integer',
+ 'failure on non-numeric compression level'
+ ],
+ [
+ 'gzip:level=236',
+ 'invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9',
+ 'failure on out-of-range compression level'
+ ],
+ [
+ 'gzip:level=9,',
+ 'invalid compression specification: found empty string where a compression option was expected',
+ 'failure on extra, empty compression option'
+ ],
+);
+for my $cft (@compression_failure_tests)
+{
+ my $cfail = quotemeta($client_fails . $cft->[1]);
+ my $sfail = quotemeta($server_fails . $cft->[1]);
+ $node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', $cft->[0] ],
+ qr/$cfail/,
+ 'client '. $cft->[2]);
+ $node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress',
+ 'server-' . $cft->[0] ],
+ qr/$sfail/,
+ 'server ' . $cft->[2]);
+}
+
# Write some files to test that they are not copied.
foreach my $filename (
qw(backup_label tablespace_map postgresql.auto.conf.tmp
diff --git a/src/common/Makefile b/src/common/Makefile
index 31c0dd366d..f627349835 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -47,6 +47,7 @@ LIBS += $(PTHREAD_LIBS)
OBJS_COMMON = \
archive.o \
+ backup_compression.o \
base64.o \
checksum_helper.o \
config_info.o \
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
new file mode 100644
index 0000000000..fac5de157d
--- /dev/null
+++ b/src/common/backup_compression.c
@@ -0,0 +1,269 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.c
+ *
+ * Shared code for backup compression methods and specifications.
+ *
+ * A compression specification specifies the parameters that should be used
+ * when performing compression with a specific algorithm. The simplest
+ * possible compression specification is an integer, which sets the
+ * compression level.
+ *
+ * Otherwise, a compression specification is a comma-separated list of items,
+ * each having the form keyword or keyword=value.
+ *
+ * Currently, the only supported keyword is "level".
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.c
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/backup_compression.h"
+
+static int expect_integer_value(char *keyword, char *value,
+ bc_specification *result);
+
+/*
+ * Look up a compression algorithm by name. Returns true and sets *algorithm
+ * if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_bc_algorithm(char *name, bc_algorithm *algorithm)
+{
+ if (strcmp(name, "none") == 0)
+ *algorithm = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(name, "gzip") == 0)
+ *algorithm = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(name, "lz4") == 0)
+ *algorithm = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(name, "zstd") == 0)
+ *algorithm = BACKUP_COMPRESSION_ZSTD;
+ else
+ return false;
+ return true;
+}
+
+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+const char *
+get_bc_algorithm_name(bc_algorithm algorithm)
+{
+ switch (algorithm)
+ {
+ case BACKUP_COMPRESSION_NONE:
+ return "none";
+ case BACKUP_COMPRESSION_GZIP:
+ return "gzip";
+ case BACKUP_COMPRESSION_LZ4:
+ return "lz4";
+ case BACKUP_COMPRESSION_ZSTD:
+ return "zstd";
+ /* no default, to provoke compiler warnings if values are added */
+ }
+ Assert(false);
+}
+
+/*
+ * Parse a compression specification for a specified algorithm.
+ *
+ * See the file header comments for a brief description of what a compression
+ * specification is expected to look like.
+ *
+ * On return, all fields of the result object will be initialized.
+ * In particular, result->parse_error will be NULL if no errors occurred
+ * during parsing, and will otherwise contain an appropriate error message.
+ * The caller may free this error message string using pfree, if desired.
+ * Note, however, even if there's no parse error, the string might not make
+ * sense: e.g. for gzip, level=12 is not sensible, but it does parse OK.
+ *
+ * Use validate_bc_specification() to find out whether a compression
+ * specification is semantically sensible.
+ */
+void
+parse_bc_specification(bc_algorithm algorithm, char *specification,
+ bc_specification *result)
+{
+ int bare_level;
+ char *bare_level_endp;
+
+ /* Initial setup of result object. */
+ result->algorithm = algorithm;
+ result->options = 0;
+ result->level = -1;
+ result->parse_error = NULL;
+
+ /* If there is no specification, we're done already. */
+ if (specification == NULL)
+ return;
+
+ /* As a special case, the specification can be a bare integer. */
+ bare_level = strtol(specification, &bare_level_endp, 10);
+ if (specification != bare_level_endp && *bare_level_endp == '\0')
+ {
+ result->level = bare_level;
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ return;
+ }
+
+ /* Look for comma-separated keyword or keyword=value entries. */
+ while (1)
+ {
+ char *kwstart;
+ char *kwend;
+ char *vstart;
+ char *vend;
+ int kwlen;
+ int vlen;
+ bool has_value;
+ char *keyword;
+ char *value;
+
+ /* Figure start, end, and length of next keyword and any value. */
+ kwstart = kwend = specification;
+ while (*kwend != '\0' && *kwend != ',' && *kwend != '=')
+ ++kwend;
+ kwlen = kwend - kwstart;
+ if (*kwend != '=')
+ {
+ vstart = vend = NULL;
+ vlen = 0;
+ has_value = false;
+ }
+ else
+ {
+ vstart = vend = kwend + 1;
+ while (*vend != '\0' && *vend != ',')
+ ++vend;
+ vlen = vend - vstart;
+ has_value = true;
+ }
+
+ /* Reject empty keyword. */
+ if (kwlen == 0)
+ {
+ result->parse_error =
+ pstrdup(_("found empty string where a compression option was expected"));
+ break;
+ }
+
+ /* Extract keyword and value as separate C strings. */
+ keyword = palloc(kwlen + 1);
+ memcpy(keyword, kwstart, kwlen);
+ keyword[kwlen] = '\0';
+ if (!has_value)
+ value = NULL;
+ else
+ {
+ value = palloc(vlen + 1);
+ memcpy(value, vstart, vlen);
+ value[vlen] = '\0';
+ }
+
+ /* Handle whatever keyword we found. */
+ if (strcmp(keyword, "level") == 0)
+ {
+ result->level = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ }
+ else
+ result->parse_error =
+ psprintf(_("unknown compression option \"%s\""), keyword);
+
+ /* Release memory, just to be tidy. */
+ pfree(keyword);
+ if (value != NULL)
+ pfree(value);
+
+ /* If we got an error or have reached the end of the string, stop. */
+ if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ break;
+
+ /* Advance to next entry and loop around. */
+ specification = vend == NULL ? kwend + 1 : vend + 1;
+ }
+}
+
+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)
+{
+ int ivalue;
+ char *ivalue_endp;
+
+ if (value == NULL)
+ {
+ result->parse_error =
+ psprintf(_("compression option \"%s\" requires a value"),
+ keyword);
+ return -1;
+ }
+
+ ivalue = strtol(value, &ivalue_endp, 10);
+ if (ivalue_endp == value || *ivalue_endp != '\0')
+ {
+ result->parse_error =
+ psprintf(_("value for compression option \"%s\" must be an integer"),
+ keyword);
+ return -1;
+ }
+ return ivalue;
+}
+
+/*
+ * Returns NULL if the compression specification string was syntactically
+ * valid and semantically sensible. Otherwise, returns an error message.
+ *
+ * Does not test whether this build of PostgreSQL supports the requested
+ * compression method.
+ */
+char *
+validate_bc_specification(bc_specification *spec)
+{
+ /* If it didn't even parse OK, it's definitely no good. */
+ if (spec->parse_error != NULL)
+ return spec->parse_error;
+
+ /*
+ * If a compression level was specified, check that the algorithm expects
+ * a compression level and that the level is within the legal range for
+ * the algorithm.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ int min_level = 1;
+ int max_level;
+
+ if (spec->algorithm == BACKUP_COMPRESSION_GZIP)
+ max_level = 9;
+ else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
+ max_level = 12;
+ else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ max_level = 22;
+ else
+ return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
+ get_bc_algorithm_name(spec->algorithm));
+
+ if (spec->level < min_level || spec->level > max_level)
+ return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+ get_bc_algorithm_name(spec->algorithm),
+ min_level, max_level);
+ }
+
+ return NULL;
+}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
new file mode 100644
index 0000000000..0565cbc657
--- /dev/null
+++ b/src/include/common/backup_compression.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.h
+ *
+ * Shared definitions for backup compression methods and specifications.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BACKUP_COMPRESSION_H
+#define BACKUP_COMPRESSION_H
+
+typedef enum bc_algorithm
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
+} bc_algorithm;
+
+#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+
+typedef struct bc_specification
+{
+ bc_algorithm algorithm;
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
+ int level;
+ char *parse_error; /* NULL if parsing was OK, else message */
+} bc_specification;
+
+extern bool parse_bc_algorithm(char *name, bc_algorithm *algorithm);
+extern const char *get_bc_algorithm_name(bc_algorithm algorithm);
+
+extern void parse_bc_specification(bc_algorithm algorithm,
+ char *specification,
+ bc_specification *result);
+
+extern char *validate_bc_specification(bc_specification *);
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a7f16758a4..654df28576 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -27,6 +27,7 @@
#define BASEBACKUP_SINK_H
#include "access/xlog_internal.h"
+#include "common/backup_compression.h"
#include "nodes/pg_list.h"
/* Forward declarations. */
@@ -283,9 +284,9 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
-extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_gzip_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_lz4_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_zstd_new(bbsink *next, bc_specification *);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 441d6ae6bf..de8676d339 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -124,7 +124,7 @@ sub mkvcbuild
}
our @pgcommonallfiles = qw(
- archive.c base64.c checksum_helper.c
+ archive.c backup_compression.c base64.c checksum_helper.c
config_info.c controldata_utils.c d2s.c encnames.c exec.c
f2s.c file_perm.c file_utils.c hashfn.c ip.c jsonapi.c
keywords.c kwlookup.c link-canary.c md5_common.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 93d5190508..1f0d71bc68 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3795,3 +3795,5 @@ yyscan_t
z_stream
z_streamp
zic_t
+bc_algorithm
+bc_specification
--
2.24.3 (Apple Git-128)
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
+ * during parsing, and will otherwise contain a an appropriate error message.
OK, thanks. v4 attached.
I haven't read the whole patch, but I noticed an omission in the
documentation changes:
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 9178c779ba..00c593f1af 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2731,14 +2731,24 @@ The commands accepted in replication mode are: <varlistentry> - <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term> + <term><literal>COMPRESSION_DETAIL</literal> <replaceable>detail</replaceable></term> <listitem> <para> Specifies the compression level to be used.
This is no longer the accurate. How about something like like "Specifies
details of the chosen compression method"?
- ilmari
On Mon, Mar 21, 2022 at 2:41 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
This is no longer the accurate. How about something like like "Specifies
details of the chosen compression method"?
Good catch. v5 attached.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v5-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchapplication/octet-stream; name=v5-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patchDownload
From b7754037c9b5e4c3d8354d7eb919fe5583c08a1b Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 22 Mar 2022 11:31:17 -0400
Subject: [PATCH v5] Replace BASE_BACKUP COMPRESSION_LEVEL option with
COMPRESSION_DETAIL.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker.
Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
---
doc/src/sgml/protocol.sgml | 22 +-
doc/src/sgml/ref/pg_basebackup.sgml | 25 +-
src/backend/replication/basebackup.c | 62 +--
src/backend/replication/basebackup_gzip.c | 20 +-
src/backend/replication/basebackup_lz4.c | 19 +-
src/backend/replication/basebackup_zstd.c | 19 +-
src/bin/pg_basebackup/bbstreamer.h | 7 +-
src/bin/pg_basebackup/bbstreamer_gzip.c | 7 +-
src/bin/pg_basebackup/bbstreamer_lz4.c | 4 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 4 +-
src/bin/pg_basebackup/pg_basebackup.c | 409 ++++++++-----------
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 72 +++-
src/common/Makefile | 1 +
src/common/backup_compression.c | 269 ++++++++++++
src/include/common/backup_compression.h | 44 ++
src/include/replication/basebackup_sink.h | 7 +-
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/pgindent/typedefs.list | 2 +
18 files changed, 664 insertions(+), 331 deletions(-)
create mode 100644 src/common/backup_compression.c
create mode 100644 src/include/common/backup_compression.h
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..719b947ef4 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
</varlistentry>
<varlistentry>
- <term><literal>COMPRESSION_LEVEL</literal> <replaceable>level</replaceable></term>
+ <term><literal>COMPRESSION_DETAIL</literal> <replaceable>detail</replaceable></term>
<listitem>
<para>
- Specifies the compression level to be used. This should only be
- used in conjunction with the <literal>COMPRESSION</literal> option.
- For <literal>gzip</literal> the value should be an integer between 1
- and 9, for <literal>lz4</literal> between 1 and 12, and for
- <literal>zstd</literal> it should be between 1 and 22.
+ Specifies details for the chosen compression method. This should only
+ be used in conjunction with the <literal>COMPRESSION</literal>
+ option. If the value is an integer, it specifies the compression
+ level. Otherwise, it should be a comma-separated list of items,
+ each of the form <literal>keyword</literal> or
+ <literal>keyword=value</literal>. Currently, the only supported
+ keyword is <literal>level</literal>, which sets the compression
+ level.
+ </para>
+
+ <para>
+ For <literal>gzip</literal> the compression level should be an
+ integer between 1 and 9, for <literal>lz4</literal> an integer
+ between 1 and 12, and for <literal>zstd</literal> an integer
+ between 1 and 22.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 4a630b59b7..46d7f15e54 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -399,9 +399,9 @@ PostgreSQL documentation
<varlistentry>
<term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
- <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>-Z [{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>detail</replaceable>]</term>
<term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
- <term><option>--compress=[{client|server}-]<replaceable class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+ <term><option>--compress=[{client|server}-]<replaceable class="parameter">method[:<replaceable>detail</replaceable>]</replaceable></option></term>
<listitem>
<para>
Requests compression of the backup. If <literal>client</literal> or
@@ -419,13 +419,20 @@ PostgreSQL documentation
<para>
The compression method can be set to <literal>gzip</literal>,
<literal>lz4</literal>, <literal>zstd</literal>, or
- <literal>none</literal> for no compression. A compression level can
- optionally be specified, by appending the level number after a colon
- (<literal>:</literal>). If no level is specified, the default
- compression level will be used. If only a level is specified without
- mentioning an algorithm, <literal>gzip</literal> compression will be
- used if the level is greater than 0, and no compression will be used if
- the level is 0.
+ <literal>none</literal> for no compression. A compression detail
+ string can optionally be specified. If the detail string is an
+ integer, it specifies the compression level. Otherwise, it should be
+ a comma-separated list of items, each of the form
+ <literal>keyword</literal> or <literal>keyword=value</literal>.
+ Currently, the only supported keyword is <literal>level</literal>,
+ which sets the compression level.
+ </para>
+ <para>
+ If no compression level is specified, the default compression level
+ will be used. If only a level is specified without mentioning an
+ algorithm, <literal>gzip</literal> compression will be used if the
+ level is greater than 0, and no compression will be used if the level
+ is 0.
</para>
<para>
When the tar format is used with <literal>gzip</literal>,
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c2aedc14a2..49deead091 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -17,6 +17,7 @@
#include <time.h>
#include "access/xlog_internal.h" /* for pg_start/stop_backup */
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "commands/defrem.h"
#include "lib/stringinfo.h"
@@ -54,14 +55,6 @@
*/
#define SINK_BUFFER_LENGTH Max(32768, BLCKSZ)
-typedef enum
-{
- BACKUP_COMPRESSION_NONE,
- BACKUP_COMPRESSION_GZIP,
- BACKUP_COMPRESSION_LZ4,
- BACKUP_COMPRESSION_ZSTD
-} basebackup_compression_type;
-
typedef struct
{
const char *label;
@@ -75,8 +68,8 @@ typedef struct
bool use_copytblspc;
BaseBackupTargetHandle *target_handle;
backup_manifest_option manifest;
- basebackup_compression_type compression;
- int compression_level;
+ bc_algorithm compression;
+ bc_specification compression_specification;
pg_checksum_type manifest_checksum_type;
} basebackup_options;
@@ -713,12 +706,14 @@ parse_basebackup_options(List *options, basebackup_options *opt)
char *target_str = NULL;
char *target_detail_str = NULL;
bool o_compression = false;
- bool o_compression_level = false;
+ bool o_compression_detail = false;
+ char *compression_detail_str = NULL;
MemSet(opt, 0, sizeof(*opt));
opt->manifest = MANIFEST_OPTION_NO;
opt->manifest_checksum_type = CHECKSUM_TYPE_CRC32C;
opt->compression = BACKUP_COMPRESSION_NONE;
+ opt->compression_specification.algorithm = BACKUP_COMPRESSION_NONE;
foreach(lopt, options)
{
@@ -885,29 +880,21 @@ parse_basebackup_options(List *options, basebackup_options *opt)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- if (strcmp(optval, "none") == 0)
- opt->compression = BACKUP_COMPRESSION_NONE;
- else if (strcmp(optval, "gzip") == 0)
- opt->compression = BACKUP_COMPRESSION_GZIP;
- else if (strcmp(optval, "lz4") == 0)
- opt->compression = BACKUP_COMPRESSION_LZ4;
- else if (strcmp(optval, "zstd") == 0)
- opt->compression = BACKUP_COMPRESSION_ZSTD;
- else
+ if (!parse_bc_algorithm(optval, &opt->compression))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized compression algorithm: \"%s\"",
+ errmsg("unrecognized compression algorithm \"%s\"",
optval)));
o_compression = true;
}
- else if (strcmp(defel->defname, "compression_level") == 0)
+ else if (strcmp(defel->defname, "compression_detail") == 0)
{
- if (o_compression_level)
+ if (o_compression_detail)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("duplicate option \"%s\"", defel->defname)));
- opt->compression_level = defGetInt32(defel);
- o_compression_level = true;
+ compression_detail_str = defGetString(defel);
+ o_compression_detail = true;
}
else
ereport(ERROR,
@@ -949,10 +936,25 @@ parse_basebackup_options(List *options, basebackup_options *opt)
opt->target_handle =
BaseBackupGetTargetHandle(target_str, target_detail_str);
- if (o_compression_level && !o_compression)
+ if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("compression level requires compression")));
+ errmsg("compression detail requires compression")));
+
+ if (o_compression)
+ {
+ char *error_detail;
+
+ parse_bc_specification(opt->compression, compression_detail_str,
+ &opt->compression_specification);
+ error_detail =
+ validate_bc_specification(&opt->compression_specification);
+ if (error_detail != NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid compression specification: %s",
+ error_detail));
+ }
}
@@ -998,11 +1000,11 @@ SendBaseBackup(BaseBackupCmd *cmd)
/* Set up server-side compression, if client requested it */
if (opt.compression == BACKUP_COMPRESSION_GZIP)
- sink = bbsink_gzip_new(sink, opt.compression_level);
+ sink = bbsink_gzip_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_LZ4)
- sink = bbsink_lz4_new(sink, opt.compression_level);
+ sink = bbsink_lz4_new(sink, &opt.compression_specification);
else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
- sink = bbsink_zstd_new(sink, opt.compression_level);
+ sink = bbsink_zstd_new(sink, &opt.compression_specification);
/* Set up progress reporting. */
sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_gzip.c b/src/backend/replication/basebackup_gzip.c
index b66d3da7a3..703a91ba77 100644
--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_gzip_ops = {
#endif
/*
- * Create a new basebackup sink that performs gzip compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs gzip compression.
*/
bbsink *
-bbsink_gzip_new(bbsink *next, int compresslevel)
+bbsink_gzip_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef HAVE_LIBZ
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,15 +72,14 @@ bbsink_gzip_new(bbsink *next, int compresslevel)
bbsink_gzip *sink;
Assert(next != NULL);
- Assert(compresslevel >= 0 && compresslevel <= 9);
- if (compresslevel == 0)
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
compresslevel = Z_DEFAULT_COMPRESSION;
- else if (compresslevel < 0 || compresslevel > 9)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("gzip compression level %d is out of range",
- compresslevel)));
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 9);
+ }
sink = palloc0(sizeof(bbsink_gzip));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_gzip_ops;
diff --git a/src/backend/replication/basebackup_lz4.c b/src/backend/replication/basebackup_lz4.c
index d838f723d0..06c161ddc4 100644
--- a/src/backend/replication/basebackup_lz4.c
+++ b/src/backend/replication/basebackup_lz4.c
@@ -56,12 +56,13 @@ const bbsink_ops bbsink_lz4_ops = {
#endif
/*
- * Create a new basebackup sink that performs lz4 compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs lz4 compression.
*/
bbsink *
-bbsink_lz4_new(bbsink *next, int compresslevel)
+bbsink_lz4_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_LZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -72,11 +73,13 @@ bbsink_lz4_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 12)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("lz4 compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 12);
+ }
sink = palloc0(sizeof(bbsink_lz4));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_lz4_ops;
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27..96b7985693 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -55,12 +55,13 @@ const bbsink_ops bbsink_zstd_ops = {
#endif
/*
- * Create a new basebackup sink that performs zstd compression using the
- * designated compression level.
+ * Create a new basebackup sink that performs zstd compression.
*/
bbsink *
-bbsink_zstd_new(bbsink *next, int compresslevel)
+bbsink_zstd_new(bbsink *next, bc_specification *compress)
{
+ int compresslevel;
+
#ifndef USE_ZSTD
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -71,11 +72,13 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
Assert(next != NULL);
- if (compresslevel < 0 || compresslevel > 22)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("zstd compression level %d is out of range",
- compresslevel)));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ {
+ compresslevel = compress->level;
+ Assert(compresslevel >= 1 && compresslevel <= 22);
+ }
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index 02d4c05df6..dfa3f77af4 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -22,6 +22,7 @@
#ifndef BBSTREAMER_H
#define BBSTREAMER_H
+#include "common/backup_compression.h"
#include "lib/stringinfo.h"
#include "pqexpbuffer.h"
@@ -200,17 +201,17 @@ bbstreamer_buffer_until(bbstreamer *streamer, const char **data, int *len,
*/
extern bbstreamer *bbstreamer_plain_writer_new(char *pathname, FILE *file);
extern bbstreamer *bbstreamer_gzip_writer_new(char *pathname, FILE *file,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
const char *(*link_map) (const char *),
void (*report_output_file) (const char *));
extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_zstd_compressor_new(bbstreamer *next,
- int compresslevel);
+ bc_specification *compress);
extern bbstreamer *bbstreamer_zstd_decompressor_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 894f857103..1979e95639 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -76,7 +76,8 @@ const bbstreamer_ops bbstreamer_gzip_decompressor_ops = {
* closed so that the data may be written there.
*/
bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
+bbstreamer_gzip_writer_new(char *pathname, FILE *file,
+ bc_specification *compress)
{
#ifdef HAVE_LIBZ
bbstreamer_gzip_writer *streamer;
@@ -115,11 +116,11 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
}
}
- if (gzsetparams(streamer->gzfile, compresslevel,
+ if (gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
- compresslevel, get_gz_error(streamer->gzfile));
+ compress->level, get_gz_error(streamer->gzfile));
exit(1);
}
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index 810052e4e3..a6ec317e2b 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -67,7 +67,7 @@ const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_LZ4
bbstreamer_lz4_frame *streamer;
@@ -89,7 +89,7 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compresslevel;
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb..caa5edcaf1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -63,7 +63,7 @@ const bbstreamer_ops bbstreamer_zstd_decompressor_ops = {
* blocks.
*/
bbstreamer *
-bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
+bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
@@ -85,7 +85,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
/* Initialize stream compression preferences */
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
+ compress->level);
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2943d9ec1a..3e6977df1a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
#include "access/xlog_internal.h"
#include "bbstreamer.h"
+#include "common/backup_compression.h"
#include "common/file_perm.h"
#include "common/file_utils.h"
#include "common/logging.h"
@@ -57,6 +58,7 @@ typedef struct TablespaceList
typedef struct ArchiveStreamState
{
int tablespacenum;
+ bc_specification *compress;
bbstreamer *streamer;
bbstreamer *manifest_inject_streamer;
PQExpBuffer manifest_buffer;
@@ -132,9 +134,6 @@ static bool checksum_failure = false;
static bool showprogress = false;
static bool estimatesize = true;
static int verbose = 0;
-static int compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
-static CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
static IncludeWal includewal = STREAM_WAL;
static bool fastcheckpoint = false;
static bool writerecoveryconf = false;
@@ -198,7 +197,8 @@ static void progress_report(int tablespacenum, bool force, bool finished);
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile);
+ bool expect_unterminated_tarfile,
+ bc_specification *compress);
static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
void *callback_data);
static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
@@ -207,7 +207,7 @@ static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
static void ReportCopyDataParseError(size_t r, char *copybuf);
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum);
+ bool tablespacenum, bc_specification *compress);
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
static void ReceiveBackupManifest(PGconn *conn);
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
@@ -215,7 +215,9 @@ static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
static void ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf);
static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
void *callback_data);
-static void BaseBackup(void);
+static void BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc,
+ bc_specification *client_compress);
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
bool segment_finished);
@@ -405,8 +407,8 @@ usage(void)
printf(_(" -X, --wal-method=none|fetch|stream\n"
" include required WAL files with specified method\n"));
printf(_(" -z, --gzip compress tar output\n"));
- printf(_(" -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"
- " compress tar output with given compression method or level\n"));
+ printf(_(" -Z, --compress=[{client|server}-]METHOD[:DETAIL]\n"
+ " compress on client or server as specified\n"));
printf(_(" -Z, --compress=none do not compress tar output\n"));
printf(_("\nGeneral options:\n"));
printf(_(" -c, --checkpoint=fast|spread\n"
@@ -542,7 +544,9 @@ typedef struct
} logstreamer_param;
static int
-LogStreamerMain(logstreamer_param *param)
+LogStreamerMain(logstreamer_param *param,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
StreamCtl stream;
@@ -565,25 +569,14 @@ LogStreamerMain(logstreamer_param *param)
stream.mark_done = true;
stream.partial_suffix = NULL;
stream.replication_slot = replication_slot;
-
if (format == 'p')
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
COMPRESSION_NONE, 0,
stream.do_sync);
- else if (compressloc != COMPRESS_LOCATION_CLIENT)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
- stream.do_sync);
- else if (compressmethod == COMPRESSION_GZIP)
- stream.walmethod = CreateWalTarMethod(param->xlog,
- compressmethod,
- compresslevel,
- stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
- COMPRESSION_NONE,
- compresslevel,
+ wal_compress_method,
+ wal_compress_level,
stream.do_sync);
if (!ReceiveXlogStream(param->bgconn, &stream))
@@ -629,7 +622,9 @@ LogStreamerMain(logstreamer_param *param)
* stream the logfile in parallel with the backups.
*/
static void
-StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
+StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
+ WalCompressionMethod wal_compress_method,
+ int wal_compress_level)
{
logstreamer_param *param;
uint32 hi,
@@ -729,7 +724,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
int ret;
/* in child process */
- ret = LogStreamerMain(param);
+ ret = LogStreamerMain(param, wal_compress_method, wal_compress_level);
/* temp debugging aid to analyze 019_replslot_limit failures */
if (verbose)
@@ -1004,136 +999,81 @@ parse_max_rate(char *src)
}
/*
- * Utility wrapper to parse the values specified for -Z/--compress.
- * *methodres and *levelres will be optionally filled with values coming
- * from the parsed results.
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
+ *
+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwich, the requested algorithm is "turkey"
+ * and the detail string is "sandwich". We'll sort out whether that's legal
+ * at a later stage.
*/
static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
- CompressionLocation *locationres, int *levelres)
+parse_compress_options(char *option, char **algorithm, char **detail,
+ CompressionLocation *locationres)
{
char *sep;
- int firstlen;
- char *firstpart;
+ char *endp;
/*
- * clear 'levelres' so that if there are multiple compression options,
- * the last one fully overrides the earlier ones
- */
- *levelres = 0;
-
- /* check if the option is split in two */
- sep = strchr(src, ':');
-
- /*
- * The first part of the option value could be a method name, or just a
- * level value.
- */
- firstlen = (sep != NULL) ? (sep - src) : strlen(src);
- firstpart = pg_malloc(firstlen + 1);
- memcpy(firstpart, src, firstlen);
- firstpart[firstlen] = '\0';
-
- /*
- * Check if the first part of the string matches with a supported
- * compression method.
+ * Check whether the compression specification consists of a bare integer.
+ *
+ * If so, for backward compatibility, assume gzip.
*/
- if (pg_strcasecmp(firstpart, "gzip") == 0)
+ (void) strtol(option, &endp, 10);
+ if (*endp == '\0')
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ *algorithm = pstrdup("gzip");
+ *detail = pstrdup(option);
+ return;
}
- else if (pg_strcasecmp(firstpart, "client-gzip") == 0)
- {
- *methodres = COMPRESSION_GZIP;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-gzip") == 0)
+
+ /* Strip off any "client-" or "server-" prefix. */
+ if (strncmp(option, "server-", 7) == 0)
{
- *methodres = COMPRESSION_GZIP;
*locationres = COMPRESS_LOCATION_SERVER;
+ option += 7;
}
- else if (pg_strcasecmp(firstpart, "lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-lz4") == 0)
+ else if (strncmp(option, "client-", 7) == 0)
{
- *methodres = COMPRESSION_LZ4;
*locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-lz4") == 0)
- {
- *methodres = COMPRESSION_LZ4;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
- }
- else if (pg_strcasecmp(firstpart, "client-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_CLIENT;
- }
- else if (pg_strcasecmp(firstpart, "server-zstd") == 0)
- {
- *methodres = COMPRESSION_ZSTD;
- *locationres = COMPRESS_LOCATION_SERVER;
- }
- else if (pg_strcasecmp(firstpart, "none") == 0)
- {
- *methodres = COMPRESSION_NONE;
- *locationres = COMPRESS_LOCATION_UNSPECIFIED;
+ option += 7;
}
else
- {
- /*
- * It does not match anything known, so check for the
- * backward-compatible case of only an integer where the implied
- * compression method changes depending on the level value.
- */
- if (!option_parse_int(firstpart, "-Z/--compress", 0,
- INT_MAX, levelres))
- exit(1);
-
- *methodres = (*levelres > 0) ?
- COMPRESSION_GZIP : COMPRESSION_NONE;
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
- free(firstpart);
- return;
- }
-
+ /*
+ * Check whether there is a compression detail following the algorithm
+ * name.
+ */
+ sep = strchr(option, ':');
if (sep == NULL)
{
- /*
- * The caller specified a method without a colon separator, so let any
- * subsequent checks assign a default level.
- */
- free(firstpart);
- return;
+ *algorithm = pstrdup(option);
+ *detail = NULL;
}
-
- /* Check the contents after the colon separator. */
- sep++;
- if (*sep == '\0')
+ else
{
- pg_log_error("no compression level defined for method %s", firstpart);
- exit(1);
- }
+ char *alg;
- /*
- * For any of the methods currently supported, the data after the
- * separator can just be an integer.
- */
- if (!option_parse_int(sep, "-Z/--compress", 0, INT_MAX,
- levelres))
- exit(1);
+ alg = palloc((sep - option) + 1);
+ memcpy(alg, option, sep - option);
+ alg[sep - option] = '\0';
- free(firstpart);
+ *algorithm = alg;
+ *detail = pstrdup(sep + 1);
+ }
}
/*
@@ -1200,7 +1140,8 @@ static bbstreamer *
CreateBackupStreamer(char *archive_name, char *spclocation,
bbstreamer **manifest_inject_streamer_p,
bool is_recovery_guc_supported,
- bool expect_unterminated_tarfile)
+ bool expect_unterminated_tarfile,
+ bc_specification *compress)
{
bbstreamer *streamer = NULL;
bbstreamer *manifest_inject_streamer = NULL;
@@ -1316,32 +1257,28 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
archive_file = NULL;
}
- if (compressmethod == COMPRESSION_NONE ||
- compressloc != COMPRESS_LOCATION_CLIENT)
+ if (compress->algorithm == BACKUP_COMPRESSION_NONE)
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- else if (compressmethod == COMPRESSION_GZIP)
+ else if (compress->algorithm == BACKUP_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
- archive_file,
- compresslevel);
+ archive_file, compress);
}
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (compress->algorithm == BACKUP_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_lz4_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_lz4_compressor_new(streamer, compress);
}
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (compress->algorithm == BACKUP_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
streamer = bbstreamer_plain_writer_new(archive_filename,
archive_file);
- streamer = bbstreamer_zstd_compressor_new(streamer,
- compresslevel);
+ streamer = bbstreamer_zstd_compressor_new(streamer, compress);
}
else
{
@@ -1395,13 +1332,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with archive
* extraction at client then we need to decompress it.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_SERVER)
+ if (format == 'p')
{
- if (compressmethod == COMPRESSION_GZIP)
+ if (is_tar_gz)
streamer = bbstreamer_gzip_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_LZ4)
+ else if (is_tar_lz4)
streamer = bbstreamer_lz4_decompressor_new(streamer);
- else if (compressmethod == COMPRESSION_ZSTD)
+ else if (is_tar_zstd)
streamer = bbstreamer_zstd_decompressor_new(streamer);
}
@@ -1415,13 +1352,14 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* manifest if present - as a single COPY stream.
*/
static void
-ReceiveArchiveStream(PGconn *conn)
+ReceiveArchiveStream(PGconn *conn, bc_specification *compress)
{
ArchiveStreamState state;
/* Set up initial state. */
memset(&state, 0, sizeof(state));
state.tablespacenum = -1;
+ state.compress = compress;
/* All the real work happens in ReceiveArchiveStreamChunk. */
ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
@@ -1542,7 +1480,8 @@ ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
CreateBackupStreamer(archive_name,
spclocation,
&state->manifest_inject_streamer,
- true, false);
+ true, false,
+ state->compress);
}
break;
}
@@ -1743,7 +1682,7 @@ ReportCopyDataParseError(size_t r, char *copybuf)
*/
static void
ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
- bool tablespacenum)
+ bool tablespacenum, bc_specification *compress)
{
WriteTarState state;
bbstreamer *manifest_inject_streamer;
@@ -1759,7 +1698,8 @@ ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
state.streamer = CreateBackupStreamer(archive_name, spclocation,
&manifest_inject_streamer,
is_recovery_guc_supported,
- expect_unterminated_tarfile);
+ expect_unterminated_tarfile,
+ compress);
state.tablespacenum = tablespacenum;
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
progress_update_filename(NULL);
@@ -1902,7 +1842,8 @@ ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
}
static void
-BaseBackup(void)
+BaseBackup(char *compression_algorithm, char *compression_detail,
+ CompressionLocation compressloc, bc_specification *client_compress)
{
PGresult *res;
char *sysidentifier;
@@ -2055,33 +1996,17 @@ BaseBackup(void)
if (compressloc == COMPRESS_LOCATION_SERVER)
{
- char *compressmethodstr = NULL;
-
if (!use_new_option_syntax)
{
pg_log_error("server does not support server-side compression");
exit(1);
}
- switch (compressmethod)
- {
- case COMPRESSION_GZIP:
- compressmethodstr = "gzip";
- break;
- case COMPRESSION_LZ4:
- compressmethodstr = "lz4";
- break;
- case COMPRESSION_ZSTD:
- compressmethodstr = "zstd";
- break;
- default:
- Assert(false);
- break;
- }
AppendStringCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION", compressmethodstr);
- if (compresslevel >= 1) /* not 0 or Z_DEFAULT_COMPRESSION */
- AppendIntegerCommandOption(&buf, use_new_option_syntax,
- "COMPRESSION_LEVEL", compresslevel);
+ "COMPRESSION", compression_algorithm);
+ if (compression_detail != NULL)
+ AppendStringCommandOption(&buf, use_new_option_syntax,
+ "COMPRESSION_DETAIL",
+ compression_detail);
}
if (verbose)
@@ -2207,15 +2132,33 @@ BaseBackup(void)
*/
if (includewal == STREAM_WAL)
{
+ WalCompressionMethod wal_compress_method;
+ int wal_compress_level;
+
if (verbose)
pg_log_info("starting background WAL receiver");
- StartLogStreamer(xlogstart, starttli, sysidentifier);
+
+ if (client_compress->algorithm == BACKUP_COMPRESSION_GZIP)
+ {
+ wal_compress_method = COMPRESSION_GZIP;
+ wal_compress_level =
+ (client_compress->options & BACKUP_COMPRESSION_OPTION_LEVEL)
+ != 0 ? client_compress->level : 0;
+ }
+ else
+ {
+ wal_compress_method = COMPRESSION_NONE;
+ wal_compress_level = 0;
+ }
+
+ StartLogStreamer(xlogstart, starttli, sysidentifier,
+ wal_compress_method, wal_compress_level);
}
if (serverMajor >= 1500)
{
/* Receive a single tar stream with everything. */
- ReceiveArchiveStream(conn);
+ ReceiveArchiveStream(conn, client_compress);
}
else
{
@@ -2244,7 +2187,8 @@ BaseBackup(void)
spclocation = PQgetvalue(res, i, 1);
}
- ReceiveTarFile(conn, archive_name, spclocation, i);
+ ReceiveTarFile(conn, archive_name, spclocation, i,
+ client_compress);
}
/*
@@ -2511,6 +2455,10 @@ main(int argc, char **argv)
int c;
int option_index;
+ char *compression_algorithm = "none";
+ char *compression_detail = NULL;
+ CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
+ bc_specification client_compress;
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
@@ -2616,17 +2564,13 @@ main(int argc, char **argv)
do_sync = false;
break;
case 'z':
-#ifdef HAVE_LIBZ
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- compresslevel = 1; /* will be rejected below */
-#endif
- compressmethod = COMPRESSION_GZIP;
+ compression_algorithm = "gzip";
+ compression_detail = NULL;
compressloc = COMPRESS_LOCATION_UNSPECIFIED;
break;
case 'Z':
- parse_compress_options(optarg, &compressmethod,
- &compressloc, &compresslevel);
+ parse_compress_options(optarg, &compression_algorithm,
+ &compression_detail, &compressloc);
break;
case 'c':
if (pg_strcasecmp(optarg, "fast") == 0)
@@ -2753,12 +2697,11 @@ main(int argc, char **argv)
}
/*
- * If we're compressing the backup and the user has not said where to
- * perform the compression, do it on the client, unless they specified
- * --target, in which case the server is the only choice.
+ * If the user has not specified where to perform backup compression,
+ * default to the client, unless the user specified --target, in which case
+ * the server is the only choice.
*/
- if (compressmethod != COMPRESSION_NONE &&
- compressloc == COMPRESS_LOCATION_UNSPECIFIED)
+ if (compressloc == COMPRESS_LOCATION_UNSPECIFIED)
{
if (backup_target == NULL)
compressloc = COMPRESS_LOCATION_CLIENT;
@@ -2766,6 +2709,40 @@ main(int argc, char **argv)
compressloc = COMPRESS_LOCATION_SERVER;
}
+ /*
+ * If any compression that we're doing is happening on the client side,
+ * we must try to parse the compression algorithm and detail, but if it's
+ * all on the server side, then we're just going to pass through whatever
+ * was requested and let the server decide what to do.
+ */
+ if (compressloc == COMPRESS_LOCATION_CLIENT)
+ {
+ bc_algorithm alg;
+ char *error_detail;
+
+ if (!parse_bc_algorithm(compression_algorithm, &alg))
+ {
+ pg_log_error("unrecognized compression algorithm \"%s\"",
+ compression_algorithm);
+ exit(1);
+ }
+
+ parse_bc_specification(alg, compression_detail, &client_compress);
+ error_detail = validate_bc_specification(&client_compress);
+ if (error_detail != NULL)
+ {
+ pg_log_error("invalid compression specification: %s",
+ error_detail);
+ exit(1);
+ }
+ }
+ else
+ {
+ Assert(compressloc == COMPRESS_LOCATION_SERVER);
+ client_compress.algorithm = BACKUP_COMPRESSION_NONE;
+ client_compress.options = 0;
+ }
+
/*
* Can't perform client-side compression if the backup is not being
* sent to the client.
@@ -2779,9 +2756,10 @@ main(int argc, char **argv)
}
/*
- * Compression doesn't make sense unless tar format is in use.
+ * Client-side compression doesn't make sense unless tar format is in use.
*/
- if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT)
+ if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT &&
+ client_compress.algorithm != BACKUP_COMPRESSION_NONE)
{
pg_log_error("only tar mode backups can be compressed");
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -2882,56 +2860,6 @@ main(int argc, char **argv)
}
}
- /* Sanity checks for compression-related options. */
- switch (compressmethod)
- {
- case COMPRESSION_NONE:
- if (compresslevel != 0)
- {
- pg_log_error("cannot use compression level with method %s",
- "none");
- fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
- progname);
- exit(1);
- }
- break;
- case COMPRESSION_GZIP:
- if (compresslevel > 9)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 9",
- compresslevel, "gzip");
- exit(1);
- }
- if (compressloc == COMPRESS_LOCATION_CLIENT)
- {
-#ifdef HAVE_LIBZ
- if (compresslevel == 0)
- compresslevel = Z_DEFAULT_COMPRESSION;
-#else
- pg_log_error("this build does not support compression with %s",
- "gzip");
- exit(1);
-#endif
- }
- break;
- case COMPRESSION_LZ4:
- if (compresslevel > 12)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 12",
- compresslevel, "lz4");
- exit(1);
- }
- break;
- case COMPRESSION_ZSTD:
- if (compresslevel > 22)
- {
- pg_log_error("compression level %d of method %s higher than maximum of 22",
- compresslevel, "zstd");
- exit(1);
- }
- break;
- }
-
/*
* Sanity checks for progress reporting options.
*/
@@ -3040,7 +2968,8 @@ main(int argc, char **argv)
free(linkloc);
}
- BaseBackup();
+ BaseBackup(compression_algorithm, compression_detail, compressloc,
+ &client_compress);
success = true;
return 0;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index efefe947d9..2869a239e7 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -42,16 +42,12 @@ $node->command_fails(['pg_basebackup'],
# Sanity checks for options
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:1' ],
- qr/\Qpg_basebackup: error: cannot use compression level with method none/,
+ qr/\Qcompression algorithm "none" does not accept a compression level/,
'failure if method "none" specified with compression level');
$node->command_fails_like(
[ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none+' ],
- qr/\Qpg_basebackup: error: invalid value "none+" for option/,
+ qr/\Qunrecognized compression algorithm "none+"/,
'failure on incorrect separator to define compression level');
-$node->command_fails_like(
- [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', 'none:' ],
- qr/\Qpg_basebackup: error: no compression level defined for method none/,
- 'failure on missing compression level value');
# Some Windows ANSI code pages may reject this filename, in which case we
# quietly proceed without this bit of test coverage.
@@ -89,6 +85,70 @@ print $conf "wal_level = replica\n";
close $conf;
$node->restart;
+# Now that we have a server that supports replication commands, test whether
+# certain invalid compression commands fail on the client side with client-side
+# compression and on the server side with server-side compression.
+my $client_fails =
+ 'pg_basebackup: error: ';
+my $server_fails =
+ 'pg_basebackup: error: could not initiate base backup: ERROR: ';
+my @compression_failure_tests = (
+ [
+ 'extrasquishy',
+ 'unrecognized compression algorithm "extrasquishy"',
+ 'failure on invalid compression algorithm'
+ ],
+ [
+ 'gzip:',
+ 'invalid compression specification: found empty string where a compression option was expected',
+ 'failure on empty compression options list'
+ ],
+ [
+ 'gzip:thunk',
+ 'invalid compression specification: unknown compression option "thunk"',
+ 'failure on unknown compression option'
+ ],
+ [
+ 'gzip:level',
+ 'invalid compression specification: compression option "level" requires a value',
+ 'failure on missing compression level'
+ ],
+ [
+ 'gzip:level=',
+ 'invalid compression specification: value for compression option "level" must be an integer',
+ 'failure on empty compression level'
+ ],
+ [
+ 'gzip:level=high',
+ 'invalid compression specification: value for compression option "level" must be an integer',
+ 'failure on non-numeric compression level'
+ ],
+ [
+ 'gzip:level=236',
+ 'invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9',
+ 'failure on out-of-range compression level'
+ ],
+ [
+ 'gzip:level=9,',
+ 'invalid compression specification: found empty string where a compression option was expected',
+ 'failure on extra, empty compression option'
+ ],
+);
+for my $cft (@compression_failure_tests)
+{
+ my $cfail = quotemeta($client_fails . $cft->[1]);
+ my $sfail = quotemeta($server_fails . $cft->[1]);
+ $node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress', $cft->[0] ],
+ qr/$cfail/,
+ 'client '. $cft->[2]);
+ $node->command_fails_like(
+ [ 'pg_basebackup', '-D', "$tempdir/backup", '--compress',
+ 'server-' . $cft->[0] ],
+ qr/$sfail/,
+ 'server ' . $cft->[2]);
+}
+
# Write some files to test that they are not copied.
foreach my $filename (
qw(backup_label tablespace_map postgresql.auto.conf.tmp
diff --git a/src/common/Makefile b/src/common/Makefile
index 31c0dd366d..f627349835 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -47,6 +47,7 @@ LIBS += $(PTHREAD_LIBS)
OBJS_COMMON = \
archive.o \
+ backup_compression.o \
base64.o \
checksum_helper.o \
config_info.o \
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
new file mode 100644
index 0000000000..fac5de157d
--- /dev/null
+++ b/src/common/backup_compression.c
@@ -0,0 +1,269 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.c
+ *
+ * Shared code for backup compression methods and specifications.
+ *
+ * A compression specification specifies the parameters that should be used
+ * when performing compression with a specific algorithm. The simplest
+ * possible compression specification is an integer, which sets the
+ * compression level.
+ *
+ * Otherwise, a compression specification is a comma-separated list of items,
+ * each having the form keyword or keyword=value.
+ *
+ * Currently, the only supported keyword is "level".
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.c
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/backup_compression.h"
+
+static int expect_integer_value(char *keyword, char *value,
+ bc_specification *result);
+
+/*
+ * Look up a compression algorithm by name. Returns true and sets *algorithm
+ * if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_bc_algorithm(char *name, bc_algorithm *algorithm)
+{
+ if (strcmp(name, "none") == 0)
+ *algorithm = BACKUP_COMPRESSION_NONE;
+ else if (strcmp(name, "gzip") == 0)
+ *algorithm = BACKUP_COMPRESSION_GZIP;
+ else if (strcmp(name, "lz4") == 0)
+ *algorithm = BACKUP_COMPRESSION_LZ4;
+ else if (strcmp(name, "zstd") == 0)
+ *algorithm = BACKUP_COMPRESSION_ZSTD;
+ else
+ return false;
+ return true;
+}
+
+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+const char *
+get_bc_algorithm_name(bc_algorithm algorithm)
+{
+ switch (algorithm)
+ {
+ case BACKUP_COMPRESSION_NONE:
+ return "none";
+ case BACKUP_COMPRESSION_GZIP:
+ return "gzip";
+ case BACKUP_COMPRESSION_LZ4:
+ return "lz4";
+ case BACKUP_COMPRESSION_ZSTD:
+ return "zstd";
+ /* no default, to provoke compiler warnings if values are added */
+ }
+ Assert(false);
+}
+
+/*
+ * Parse a compression specification for a specified algorithm.
+ *
+ * See the file header comments for a brief description of what a compression
+ * specification is expected to look like.
+ *
+ * On return, all fields of the result object will be initialized.
+ * In particular, result->parse_error will be NULL if no errors occurred
+ * during parsing, and will otherwise contain an appropriate error message.
+ * The caller may free this error message string using pfree, if desired.
+ * Note, however, even if there's no parse error, the string might not make
+ * sense: e.g. for gzip, level=12 is not sensible, but it does parse OK.
+ *
+ * Use validate_bc_specification() to find out whether a compression
+ * specification is semantically sensible.
+ */
+void
+parse_bc_specification(bc_algorithm algorithm, char *specification,
+ bc_specification *result)
+{
+ int bare_level;
+ char *bare_level_endp;
+
+ /* Initial setup of result object. */
+ result->algorithm = algorithm;
+ result->options = 0;
+ result->level = -1;
+ result->parse_error = NULL;
+
+ /* If there is no specification, we're done already. */
+ if (specification == NULL)
+ return;
+
+ /* As a special case, the specification can be a bare integer. */
+ bare_level = strtol(specification, &bare_level_endp, 10);
+ if (specification != bare_level_endp && *bare_level_endp == '\0')
+ {
+ result->level = bare_level;
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ return;
+ }
+
+ /* Look for comma-separated keyword or keyword=value entries. */
+ while (1)
+ {
+ char *kwstart;
+ char *kwend;
+ char *vstart;
+ char *vend;
+ int kwlen;
+ int vlen;
+ bool has_value;
+ char *keyword;
+ char *value;
+
+ /* Figure start, end, and length of next keyword and any value. */
+ kwstart = kwend = specification;
+ while (*kwend != '\0' && *kwend != ',' && *kwend != '=')
+ ++kwend;
+ kwlen = kwend - kwstart;
+ if (*kwend != '=')
+ {
+ vstart = vend = NULL;
+ vlen = 0;
+ has_value = false;
+ }
+ else
+ {
+ vstart = vend = kwend + 1;
+ while (*vend != '\0' && *vend != ',')
+ ++vend;
+ vlen = vend - vstart;
+ has_value = true;
+ }
+
+ /* Reject empty keyword. */
+ if (kwlen == 0)
+ {
+ result->parse_error =
+ pstrdup(_("found empty string where a compression option was expected"));
+ break;
+ }
+
+ /* Extract keyword and value as separate C strings. */
+ keyword = palloc(kwlen + 1);
+ memcpy(keyword, kwstart, kwlen);
+ keyword[kwlen] = '\0';
+ if (!has_value)
+ value = NULL;
+ else
+ {
+ value = palloc(vlen + 1);
+ memcpy(value, vstart, vlen);
+ value[vlen] = '\0';
+ }
+
+ /* Handle whatever keyword we found. */
+ if (strcmp(keyword, "level") == 0)
+ {
+ result->level = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
+ }
+ else
+ result->parse_error =
+ psprintf(_("unknown compression option \"%s\""), keyword);
+
+ /* Release memory, just to be tidy. */
+ pfree(keyword);
+ if (value != NULL)
+ pfree(value);
+
+ /* If we got an error or have reached the end of the string, stop. */
+ if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ break;
+
+ /* Advance to next entry and loop around. */
+ specification = vend == NULL ? kwend + 1 : vend + 1;
+ }
+}
+
+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)
+{
+ int ivalue;
+ char *ivalue_endp;
+
+ if (value == NULL)
+ {
+ result->parse_error =
+ psprintf(_("compression option \"%s\" requires a value"),
+ keyword);
+ return -1;
+ }
+
+ ivalue = strtol(value, &ivalue_endp, 10);
+ if (ivalue_endp == value || *ivalue_endp != '\0')
+ {
+ result->parse_error =
+ psprintf(_("value for compression option \"%s\" must be an integer"),
+ keyword);
+ return -1;
+ }
+ return ivalue;
+}
+
+/*
+ * Returns NULL if the compression specification string was syntactically
+ * valid and semantically sensible. Otherwise, returns an error message.
+ *
+ * Does not test whether this build of PostgreSQL supports the requested
+ * compression method.
+ */
+char *
+validate_bc_specification(bc_specification *spec)
+{
+ /* If it didn't even parse OK, it's definitely no good. */
+ if (spec->parse_error != NULL)
+ return spec->parse_error;
+
+ /*
+ * If a compression level was specified, check that the algorithm expects
+ * a compression level and that the level is within the legal range for
+ * the algorithm.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ int min_level = 1;
+ int max_level;
+
+ if (spec->algorithm == BACKUP_COMPRESSION_GZIP)
+ max_level = 9;
+ else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
+ max_level = 12;
+ else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ max_level = 22;
+ else
+ return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
+ get_bc_algorithm_name(spec->algorithm));
+
+ if (spec->level < min_level || spec->level > max_level)
+ return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+ get_bc_algorithm_name(spec->algorithm),
+ min_level, max_level);
+ }
+
+ return NULL;
+}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
new file mode 100644
index 0000000000..0565cbc657
--- /dev/null
+++ b/src/include/common/backup_compression.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * backup_compression.h
+ *
+ * Shared definitions for backup compression methods and specifications.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/common/backup_compression.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef BACKUP_COMPRESSION_H
+#define BACKUP_COMPRESSION_H
+
+typedef enum bc_algorithm
+{
+ BACKUP_COMPRESSION_NONE,
+ BACKUP_COMPRESSION_GZIP,
+ BACKUP_COMPRESSION_LZ4,
+ BACKUP_COMPRESSION_ZSTD
+} bc_algorithm;
+
+#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+
+typedef struct bc_specification
+{
+ bc_algorithm algorithm;
+ unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
+ int level;
+ char *parse_error; /* NULL if parsing was OK, else message */
+} bc_specification;
+
+extern bool parse_bc_algorithm(char *name, bc_algorithm *algorithm);
+extern const char *get_bc_algorithm_name(bc_algorithm algorithm);
+
+extern void parse_bc_specification(bc_algorithm algorithm,
+ char *specification,
+ bc_specification *result);
+
+extern char *validate_bc_specification(bc_specification *);
+
+#endif
diff --git a/src/include/replication/basebackup_sink.h b/src/include/replication/basebackup_sink.h
index a7f16758a4..654df28576 100644
--- a/src/include/replication/basebackup_sink.h
+++ b/src/include/replication/basebackup_sink.h
@@ -27,6 +27,7 @@
#define BASEBACKUP_SINK_H
#include "access/xlog_internal.h"
+#include "common/backup_compression.h"
#include "nodes/pg_list.h"
/* Forward declarations. */
@@ -283,9 +284,9 @@ extern void bbsink_forward_cleanup(bbsink *sink);
/* Constructors for various types of sinks. */
extern bbsink *bbsink_copystream_new(bool send_to_client);
-extern bbsink *bbsink_gzip_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_lz4_new(bbsink *next, int compresslevel);
-extern bbsink *bbsink_zstd_new(bbsink *next, int compresslevel);
+extern bbsink *bbsink_gzip_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_lz4_new(bbsink *next, bc_specification *);
+extern bbsink *bbsink_zstd_new(bbsink *next, bc_specification *);
extern bbsink *bbsink_progress_new(bbsink *next, bool estimate_backup_size);
extern bbsink *bbsink_server_new(bbsink *next, char *pathname);
extern bbsink *bbsink_throttle_new(bbsink *next, uint32 maxrate);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 441d6ae6bf..de8676d339 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -124,7 +124,7 @@ sub mkvcbuild
}
our @pgcommonallfiles = qw(
- archive.c base64.c checksum_helper.c
+ archive.c backup_compression.c base64.c checksum_helper.c
config_info.c controldata_utils.c d2s.c encnames.c exec.c
f2s.c file_perm.c file_utils.c hashfn.c ip.c jsonapi.c
keywords.c kwlookup.c link-canary.c md5_common.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 93d5190508..1f0d71bc68 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3795,3 +3795,5 @@ yyscan_t
z_stream
z_streamp
zic_t
+bc_algorithm
+bc_specification
--
2.24.3 (Apple Git-128)
On Tue, Mar 22, 2022 at 11:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 21, 2022 at 2:41 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:This is no longer the accurate. How about something like like "Specifies
details of the chosen compression method"?Good catch. v5 attached.
And committed.
--
Robert Haas
EDB: http://www.enterprisedb.com
[ Changing subject line in the hopes of attracting more eyeballs. ]
On Mon, Mar 14, 2022 at 12:11 PM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
I tried to implement support for parallel ZSTD compression.
Here's a new patch for this. It's more of a rewrite than an update,
honestly; commit ffd53659c46a54a6978bcb8c4424c1e157a2c0f1 necessitated
totally different options handling, but I also redid the test cases,
the documentation, and the error message.
For those who may not have been following along, here's an executive
summary: libzstd offers an option for parallel compression. It's
intended to be transparent: you just say you want it, and the library
takes care of it for you. Since we have the ability to do backup
compression on either the client or the server side, we can expose
this option in both locations. That would be cool, because it would
allow for really fast backup compression with a good compression
ratio. It would also mean that we would be, or really libzstd would
be, spawning threads inside the PostgreSQL backend. Short of cats and
dogs living together, it's hard to think of anything more terrifying,
because the PostgreSQL backend is very much not thread-safe. However,
a lot of the things we usually worry about when people make noises
about using threads in the backend don't apply here, because the
threads are hidden away behind libzstd interfaces and can't execute
any PostgreSQL code. Therefore, I think it might be safe to just ...
turn this on. One reason I think that is that this whole approach was
recommended to me by Andres ... but that's not to say that there
couldn't be problems. I worry a bit that the mere presence of threads
could in some way mess things up, but I don't know what the mechanism
for that would be, and I don't want to postpone shipping useful
features based on nebulous fears.
In my ideal world, I'd like to push this into v15. I've done a lot of
work to improve the backup code in this release, and this is actually
a very small change yet one that potentially enables the project to
get a lot more value out of the work that has already been committed.
That said, I also don't want to break the world, so if you have an
idea what this would break, please tell me.
For those curious as to how this affects performance and backup size,
I loaded up the UK land registry database. That creates a 3769MB
database. Then I backed it up using client-side compression and
server-side compression using the various different algorithms that
are supported in the master branch, plus parallel zstd.
no compression: 3.7GB, 9 seconds
gzip: 1.5GB, 140 seconds with server-side, 141 seconds with client-side
lz4: 2.0GB, 13 seconds with server-side, 12 seconds with client-side
For both parallel and non-parallel zstd compression, I see differences
between the compressed size depending on where the compression is
done. I don't know whether this is an expected behavior of the zstd
library or a bug. Both files uncompress OK and pass pg_verifybackup,
but that doesn't mean we're not, for example, selecting different
compression levels where we shouldn't be. I'll try to figure out
what's going on here.
zstd, client-side: 1.7GB, 17 seconds
zstd, server-side: 1.3GB, 25 seconds
parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
Notice that compressing the backup with parallel zstd is actually
faster than taking an uncompressed backup, even though this test is
all being run on the same machine. That's kind of crazy to me: the
parallel compression is so fast that we save more time on I/O than we
spend compressing. This assumes of course that you have plenty of CPU
resources and limited I/O resources, which won't be true for everyone,
but it's not an unusual situation.
I think the documentation changes in this patch might not be quite up
to scratch. I think there's a brewing problem here: as we add more
compression options, whether or not that happens in this release, and
regardless of what specific options we add, the way things are
structured right now, we're going to end up either duplicating a bunch
of stuff between the pg_basebackup documentation and the BASE_BACKUP
documentation, or else one of those places is going to end up lacking
information that someone reading it might like to have. I'm not
exactly sure what to do about this, though.
This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
0001-Allow-parallel-zstd-compression-when-taking-a-base-b.patchapplication/octet-stream; name=0001-Allow-parallel-zstd-compression-when-taking-a-base-b.patchDownload
From bf27b972eaf29c0a40b949eac40150a1d9ee00b0 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 23 Mar 2022 11:00:33 -0400
Subject: [PATCH] Allow parallel zstd compression when taking a base backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.
When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.
Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe.
---
doc/src/sgml/protocol.sgml | 12 +++++--
doc/src/sgml/ref/pg_basebackup.sgml | 4 +--
src/backend/replication/basebackup_zstd.c | 19 +++++++++++
src/bin/pg_basebackup/bbstreamer_zstd.c | 16 +++++++++
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 5 +++
src/bin/pg_verifybackup/t/009_extract.pl | 29 ++++++++++++++--
src/bin/pg_verifybackup/t/010_client_untar.pl | 33 +++++++++++++++++--
src/common/backup_compression.c | 16 +++++++++
src/include/common/backup_compression.h | 2 ++
src/test/perl/PostgreSQL/Test/Cluster.pm | 3 +-
10 files changed, 127 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 719b947ef4..cc03a4587b 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
option. If the value is an integer, it specifies the compression
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
- <literal>keyword=value</literal>. Currently, the only supported
- keyword is <literal>level</literal>, which sets the compression
- level.
+ <literal>keyword=value</literal>. Currently, the supported keywords
+ are <literal>level</literal> and <literal>workers</literal>.
</para>
<para>
+ The <literal>level</literal> keyword sets the compression level.
For <literal>gzip</literal> the compression level should be an
integer between 1 and 9, for <literal>lz4</literal> an integer
between 1 and 12, and for <literal>zstd</literal> an integer
between 1 and 22.
</para>
+
+ <para>
+ The <literal>workers</literal> keyword sets the number of threads
+ that should be used for parallel compression. Parallel compression
+ is supported only for <literal>zstd</literal>.
+ </para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e..82f5f60625 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the only supported keyword is <literal>level</literal>,
- which sets the compression level.
+ Currently, the supported keywords are <literal>level</literal>
+ and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index bb5b668c2a..4835aa70fc 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -28,6 +28,9 @@ typedef struct bbsink_zstd
/* Compression level */
int compresslevel;
+ /* Number of parallel workers. */
+ int workers;
+
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
} bbsink_zstd;
@@ -83,6 +86,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
sink->compresslevel = compresslevel;
+ sink->workers = compress->workers;
return &sink->base;
#endif
@@ -98,6 +102,7 @@ bbsink_zstd_begin_backup(bbsink *sink)
{
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
+ size_t ret;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
@@ -106,6 +111,20 @@ bbsink_zstd_begin_backup(bbsink *sink)
ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
mysink->compresslevel);
+ /*
+ * We check for failure here because (1) older versions of the library
+ * do not support ZSTD_c_nbWorkers and (2) the library might want to
+ * reject an unreasonable values (though in practice it does not seem to do
+ * so).
+ */
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ mysink->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ mysink->workers, ZSTD_getErrorName(ret)));
+
/*
* We need our own buffer, because we're going to pass different data to
* the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index caa5edcaf1..e17dfb6bd5 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,6 +67,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
+ size_t ret;
Assert(next != NULL);
@@ -87,6 +88,21 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
compress->level);
+ /*
+ * We check for failure here because (1) older versions of the library
+ * do not support ZSTD_c_nbWorkers and (2) the library might want to
+ * reject unreasonable values (though in practice it does not seem to do
+ * so).
+ */
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret));
+ exit(1);
+ }
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 2869a239e7..f074fe19b7 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -133,6 +133,11 @@ my @compression_failure_tests = (
'invalid compression specification: found empty string where a compression option was expected',
'failure on extra, empty compression option'
],
+ [
+ 'gzip:workers=3',
+ 'invalid compression specification: compression algorithm "gzip" does not accept a worker count',
+ 'failure on worker count for gzip'
+ ],
);
for my $cft (@compression_failure_tests)
{
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 9f9cc7540b..e17e7cad51 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -36,6 +36,12 @@ my @test_configuration = (
'compression_method' => 'zstd',
'backup_flags' => ['--compress', 'server-zstd:5'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:workers=3'],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -57,8 +63,27 @@ for my $tc (@test_configuration)
my @verify = ('pg_verifybackup', '-e', $backup_path);
# A backup with a valid compression method should work.
- $primary->command_ok(\@backup,
- "backup done, compression method \"$method\"");
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 2;
+ }
+ else
+ {
+ ok($backup_result, "backup done, compression $method");
+ }
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 487e30e826..5f6a4b9963 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -50,6 +50,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:workers=3'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -70,9 +79,27 @@ for my $tc (@test_configuration)
'pg_basebackup', '-D', $backup_path,
'-Xfetch', '--no-sync', '-cfast', '-Ft');
push @backup, @{$tc->{'backup_flags'}};
- $primary->command_ok(\@backup,
- "client side backup, compression $method");
-
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 3;
+ }
+ else
+ {
+ ok($backup_result, "client side backup, compression $method");
+ }
# Verify that the we got the files we expected.
my $backup_files = join(',',
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c4..969e08cca2 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->level = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
}
+ else if (strcmp(keyword, "workers") == 0)
+ {
+ result->workers = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -266,5 +271,16 @@ validate_bc_specification(bc_specification *spec)
min_level, max_level);
}
+ /*
+ * Of the compression algorithms that we currently support, only zstd
+ * allows parallel workers.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0 &&
+ (spec->algorithm != BACKUP_COMPRESSION_ZSTD))
+ {
+ return psprintf(_("compression algorithm \"%s\" does not accept a worker count"),
+ get_bc_algorithm_name(spec->algorithm));
+ }
+
return NULL;
}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 0565cbc657..6a0ecaa99c 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -23,12 +23,14 @@ typedef enum bc_algorithm
} bc_algorithm;
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
typedef struct bc_specification
{
bc_algorithm algorithm;
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
+ int workers;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index e7b9161137..8d838f7d6d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2503,8 +2503,7 @@ sub run_log
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::run_log(@_);
- return;
+ return PostgreSQL::Test::Utils::run_log(@_);
}
=pod
--
2.24.3 (Apple Git-128)
Hi,
On 2022-03-23 16:34:04 -0400, Robert Haas wrote:
Therefore, I think it might be safe to just ... turn this on. One reason I
think that is that this whole approach was recommended to me by Andres ...
I didn't do a super careful analysis of the issues... But I do think it's
pretty much the one case where it "should" be safe.
The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.
What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?
If they're short-lived, are we compressing large enough batches to not waste a
lot of time starting/stopping threads?
but that's not to say that there couldn't be problems. I worry a bit that
the mere presence of threads could in some way mess things up, but I don't
know what the mechanism for that would be, and I don't want to postpone
shipping useful features based on nebulous fears.
One thing that'd be good to tests for is cancelling in-progress server-side
compression. And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.
For both parallel and non-parallel zstd compression, I see differences
between the compressed size depending on where the compression is
done. I don't know whether this is an expected behavior of the zstd
library or a bug. Both files uncompress OK and pass pg_verifybackup,
but that doesn't mean we're not, for example, selecting different
compression levels where we shouldn't be. I'll try to figure out
what's going on here.zstd, client-side: 1.7GB, 17 seconds
zstd, server-side: 1.3GB, 25 seconds
parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
What causes this fairly massive client-side/server-side size difference?
+ /* + * We check for failure here because (1) older versions of the library + * do not support ZSTD_c_nbWorkers and (2) the library might want to + * reject unreasonable values (though in practice it does not seem to do + * so). + */ + ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers, + compress->workers); + if (ZSTD_isError(ret)) + { + pg_log_error("could not set compression worker count to %d: %s", + compress->workers, ZSTD_getErrorName(ret)); + exit(1); + }
Will this cause test failures on systems with older zstd?
Greetings,
Andres Freund
+ * We check for failure here because (1) older versions of the library
+ * do not support ZSTD_c_nbWorkers and (2) the library might want to
+ * reject an unreasonable values (though in practice it does not seem to do
+ * so).
+ */
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ mysink->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ mysink->workers, ZSTD_getErrorName(ret)));
Also because the library may not be compiled with threading. A few days ago, I
tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
patch but then couldn't test it, even after trying various versions of the zstd
package and trying to compile it locally. I'll try again soon...
I think you should also test the return value when setting the compress level.
Not only because it's generally a good idea, but also because I suggested to
support negative compression levels. Which weren't allowed before v1.3.4, and
then the range is only defined since 1.3.6 (ZSTD_minCLevel). At some point,
the range may have been -7..22 but now it's -131072..22.
lib/compress/zstd_compress.c:int ZSTD_minCLevel(void) { return (int)-ZSTD_TARGETLENGTH_MAX; }
lib/zstd.h:#define ZSTD_TARGETLENGTH_MAX ZSTD_BLOCKSIZE_MAX
lib/zstd.h:#define ZSTD_BLOCKSIZE_MAX (1<<ZSTD_BLOCKSIZELOG_MAX)
lib/zstd.h:#define ZSTD_BLOCKSIZELOG_MAX 17
; -1<<17
-131072
Attachments:
0001-pg_basebackup-support-Zstd-negative-compression-leve.txttext/x-diff; charset=us-asciiDownload
From 80f45cfbe13d6fc0f16e49b7ea76f1e50afb632c Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH] pg_basebackup: support Zstd negative compression levels
"higher than maximum" is bogus
TODO: each compression methods should enforce its own levels
---
src/backend/replication/basebackup_zstd.c | 7 +++++--
src/bin/pg_basebackup/bbstreamer_zstd.c | 5 ++++-
src/common/backup_compression.c | 6 +++++-
3 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 4835aa70fca..74681ee3fe8 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -79,7 +79,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
else
{
compresslevel = compress->level;
- Assert(compresslevel >= 1 && compresslevel <= 22);
+ Assert(compresslevel >= -7 && compresslevel <= 22 && compresslevel != 0);
}
sink = palloc0(sizeof(bbsink_zstd));
@@ -108,8 +108,11 @@ bbsink_zstd_begin_backup(bbsink *sink)
if (!mysink->cctx)
elog(ERROR, "could not create zstd compression context");
- ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
mysink->compresslevel);
+ if (ZSTD_isError(ret))
+ elog(ERROR, "could not create zstd compression context: %s",
+ ZSTD_getErrorName(ret));
/*
* We check for failure here because (1) older versions of the library
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e17dfb6bd54..640729003a4 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -85,8 +85,11 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
pg_log_error("could not create zstd compression context");
/* Initialize stream compression preferences */
- ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
compress->level);
+ if (ZSTD_isError(ret))
+ pg_log_error("could not set compression level to: %d: %s",
+ compress->level, ZSTD_getErrorName(ret));
/*
* We check for failure here because (1) older versions of the library
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 969e08cca20..c0eff30024c 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -260,13 +260,17 @@ validate_bc_specification(bc_specification *spec)
else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
max_level = 12;
else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ {
max_level = 22;
+ /* The minimum level depends on the version.. */
+ min_level = -7;
+ }
else
return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
get_bc_algorithm_name(spec->algorithm));
if (spec->level < min_level || spec->level > max_level)
- return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+ return psprintf(_("compression algorithm \"%s\" expects a nonzero compression level between %d and %d"),
get_bc_algorithm_name(spec->algorithm),
min_level, max_level);
}
--
2.17.1
On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?
I haven't found any real documentation explaining how libzstd manages
its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
don't know. I guess I could try to figure it out from the source code.
Anyway, what we have now is a PG_TRY()/PG_CATCH() block around the
code that uses the basink which will cause bbsink_zstd_cleanup() to
get called in the event of an error. That will do ZSTD_freeCCtx().
It's probably also worth mentioning here that even if, contrary to
expectations, the compression threads hang around to the end of time
and chill, in practice nobody is likely to run BASE_BACKUP and then
keep the connection open for a long time afterward. So it probably
wouldn't really affect resource utilization in real-world scenarios
even if the threads never exited, as long as they didn't, you know,
busy-loop in the background. And I assume the actual library behavior
can't be nearly that bad. This is a pretty mainstream piece of
software.
If they're short-lived, are we compressing large enough batches to not waste a
lot of time starting/stopping threads?
Well, we're using a single ZSTD_CCtx for an entire base backup. Again,
I haven't found documentation explaining with libzstd is actually
doing, but it's hard to see how we could make the batch any bigger
than that. The context gets reset for each new tablespace, which may
or may not do anything to the compression threads.
but that's not to say that there couldn't be problems. I worry a bit that
the mere presence of threads could in some way mess things up, but I don't
know what the mechanism for that would be, and I don't want to postpone
shipping useful features based on nebulous fears.One thing that'd be good to tests for is cancelling in-progress server-side
compression. And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.
More specific suggestions, please?
For both parallel and non-parallel zstd compression, I see differences
between the compressed size depending on where the compression is
done. I don't know whether this is an expected behavior of the zstd
library or a bug. Both files uncompress OK and pass pg_verifybackup,
but that doesn't mean we're not, for example, selecting different
compression levels where we shouldn't be. I'll try to figure out
what's going on here.zstd, client-side: 1.7GB, 17 seconds
zstd, server-side: 1.3GB, 25 seconds
parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
parallel zstd, 4 workers, server-side: 1.3GB, 7.2 secondsWhat causes this fairly massive client-side/server-side size difference?
You seem not to have read what I wrote about this exact point in the
text which you quoted.
Will this cause test failures on systems with older zstd?
I put a bunch of logic in the test case to try to avoid that, so
hopefully not, but if it does, we can adjust the logic.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Also because the library may not be compiled with threading. A few days ago, I
tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
patch but then couldn't test it, even after trying various versions of the zstd
package and trying to compile it locally. I'll try again soon...
Ah. Right, I can update the comment to mention that.
I think you should also test the return value when setting the compress level.
Not only because it's generally a good idea, but also because I suggested to
support negative compression levels. Which weren't allowed before v1.3.4, and
then the range is only defined since 1.3.6 (ZSTD_minCLevel). At some point,
the range may have been -7..22 but now it's -131072..22.
Yeah, I was thinking that might be a good change. It would require
adjusting some other code though, because right now only compression
levels 1..22 are accepted anyhow.
lib/compress/zstd_compress.c:int ZSTD_minCLevel(void) { return (int)-ZSTD_TARGETLENGTH_MAX; }
lib/zstd.h:#define ZSTD_TARGETLENGTH_MAX ZSTD_BLOCKSIZE_MAX
lib/zstd.h:#define ZSTD_BLOCKSIZE_MAX (1<<ZSTD_BLOCKSIZELOG_MAX)
lib/zstd.h:#define ZSTD_BLOCKSIZELOG_MAX 17
; -1<<17
-131072
So does that, like, compress the value by making it way bigger? :-)
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Mar 23, 2022 at 04:34:04PM -0400, Robert Haas wrote:
be, spawning threads inside the PostgreSQL backend. Short of cats and
dogs living together, it's hard to think of anything more terrifying,
because the PostgreSQL backend is very much not thread-safe. However,
a lot of the things we usually worry about when people make noises
about using threads in the backend don't apply here, because the
threads are hidden away behind libzstd interfaces and can't execute
any PostgreSQL code. Therefore, I think it might be safe to just ...
turn this on. One reason I think that is that this whole approach was
recommended to me by Andres ... but that's not to say that there
couldn't be problems. I worry a bit that the mere presence of threads
could in some way mess things up, but I don't know what the mechanism
for that would be, and I don't want to postpone shipping useful
features based on nebulous fears.
Note that the PGDG .RPMs and .DEBs are already linked with pthread, via
libxml => liblzma.
$ ldd /usr/pgsql-14/bin/postgres |grep xm
libxml2.so.2 => /lib64/libxml2.so.2 (0x00007faab984e000)
$ objdump -p /lib64/libxml2.so.2 |grep NEED
NEEDED libdl.so.2
NEEDED libz.so.1
NEEDED liblzma.so.5
NEEDED libm.so.6
NEEDED libc.so.6
VERNEED 0x0000000000019218
VERNEEDNUM 0x0000000000000005
$ objdump -p /lib64/liblzma.so.5 |grep NEED
NEEDED libpthread.so.0
Did you try this on windows at all ? It's probably no surprise that zstd
implements threading differently there.
Hi,
On 2022-03-23 18:31:12 -0400, Robert Haas wrote:
On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?I haven't found any real documentation explaining how libzstd manages
its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
don't know. I guess I could try to figure it out from the source code.
I found this the following section in the manual [1]http://facebook.github.io/zstd/zstd_manual.html:
ZSTD_c_nbWorkers=400, /* Select how many threads will be spawned to compress in parallel.
* When nbWorkers >= 1, triggers asynchronous mode when invoking ZSTD_compressStream*() :
* ZSTD_compressStream*() consumes input and flush output if possible, but immediately gives back control to caller,
* while compression is performed in parallel, within worker thread(s).
* (note : a strong exception to this rule is when first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
* in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always a blocking call).
* More workers improve speed, but also increase memory usage.
* Default value is `0`, aka "single-threaded mode" : no worker is spawned,
* compression is performed inside Caller's thread, and all invocations are blocking */
"ZSTD_compressStream*() consumes input ... immediately gives back control"
pretty much confirms that.
Do we care about zstd's memory usage here? I think it's OK to mostly ignore
work_mem/maintenance_work_mem here, but I could also see limiting concurrency
so that estimated memory usage would fit into work_mem/maintenance_work_mem.
It's probably also worth mentioning here that even if, contrary to
expectations, the compression threads hang around to the end of time
and chill, in practice nobody is likely to run BASE_BACKUP and then
keep the connection open for a long time afterward. So it probably
wouldn't really affect resource utilization in real-world scenarios
even if the threads never exited, as long as they didn't, you know,
busy-loop in the background. And I assume the actual library behavior
can't be nearly that bad. This is a pretty mainstream piece of
software.
I'm not really worried about resource utilization, more about the existence of
threads moving us into undefined behaviour territory or such. I don't think
that's possible, but it's IIRC UB to fork() while threads are present and do
pretty much *anything* other than immediately exec*().
but that's not to say that there couldn't be problems. I worry a bit that
the mere presence of threads could in some way mess things up, but I don't
know what the mechanism for that would be, and I don't want to postpone
shipping useful features based on nebulous fears.One thing that'd be good to tests for is cancelling in-progress server-side
compression. And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.More specific suggestions, please?
I was thinking of doing something like calling pthread_is_threaded_np() before
and after the zstd section and erroring out if they differ. But I forgot that
that's on mac-ism.
For both parallel and non-parallel zstd compression, I see differences
between the compressed size depending on where the compression is
done. I don't know whether this is an expected behavior of the zstd
library or a bug. Both files uncompress OK and pass pg_verifybackup,
but that doesn't mean we're not, for example, selecting different
compression levels where we shouldn't be. I'll try to figure out
what's going on here.zstd, client-side: 1.7GB, 17 seconds
zstd, server-side: 1.3GB, 25 seconds
parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
parallel zstd, 4 workers, server-side: 1.3GB, 7.2 secondsWhat causes this fairly massive client-side/server-side size difference?
You seem not to have read what I wrote about this exact point in the
text which you quoted.
Somehow not...
Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
one invocation? I recall that there's some differences between basebackup
client / serverside around buffer sizes - but that's before all the recent-ish
changes...
Greetings,
Andres Freund
On 2022-03-23 18:07:01 -0500, Justin Pryzby wrote:
Did you try this on windows at all ?
Really should get zstd installed in the windows cf environment...
It's probably no surprise that zstd implements threading differently there.
Worth noting that we have a few of our own threads running on windows already
- so we're guaranteed to build against the threaded standard libraries etc
already.
On Wed, Mar 23, 2022 at 7:07 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Did you try this on windows at all ? It's probably no surprise that zstd
implements threading differently there.
I did not. I haven't had a properly functioning Windows development
environment in about a decade.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Robert,
I haven't reviewed the meat of the patch in detail, but I noticed
something in the tests:
Robert Haas <robertmhaas@gmail.com> writes:
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl index 9f9cc7540b..e17e7cad51 100644 --- a/src/bin/pg_verifybackup/t/009_extract.pl +++ b/src/bin/pg_verifybackup/t/009_extract.pl
[…]
+ if ($backup_stdout ne '') + { + print "# standard output was:\n$backup_stdout"; + } + if ($backup_stderr ne '') + { + print "# standard error was:\n$backup_stderr"; + }
[…]
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl index 487e30e826..5f6a4b9963 100644 --- a/src/bin/pg_verifybackup/t/010_client_untar.pl +++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
[…]
+ if ($backup_stdout ne '') + { + print "# standard output was:\n$backup_stdout"; + } + if ($backup_stderr ne '') + { + print "# standard error was:\n$backup_stderr"; + }
Per the TAP protocol, every line of non-test-result output should be
prefixed by "# ". The note() function does this for you, see
https://metacpan.org/pod/Test::More#Diagnostics for details.
- ilmari
On Wed, Mar 23, 2022 at 7:31 PM Andres Freund <andres@anarazel.de> wrote:
I found this the following section in the manual [1]:
ZSTD_c_nbWorkers=400, /* Select how many threads will be spawned to compress in parallel.
* When nbWorkers >= 1, triggers asynchronous mode when invoking ZSTD_compressStream*() :
* ZSTD_compressStream*() consumes input and flush output if possible, but immediately gives back control to caller,
* while compression is performed in parallel, within worker thread(s).
* (note : a strong exception to this rule is when first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
* in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always a blocking call).
* More workers improve speed, but also increase memory usage.
* Default value is `0`, aka "single-threaded mode" : no worker is spawned,
* compression is performed inside Caller's thread, and all invocations are blocking */"ZSTD_compressStream*() consumes input ... immediately gives back control"
pretty much confirms that.
I saw that too, but I didn't consider it conclusive. It would be nice
if their documentation had a bit more detail on what's really
happening.
Do we care about zstd's memory usage here? I think it's OK to mostly ignore
work_mem/maintenance_work_mem here, but I could also see limiting concurrency
so that estimated memory usage would fit into work_mem/maintenance_work_mem.
I think it's possible that we want to do nothing and possible that we
want to do something, but I think it's very unlikely that the thing we
want to do is related to maintenance_work_mem. Say we soft-cap the
compression level to the one which we think will fit within
maintanence_work_mem. I think the most likely outcome is that people
will not get the compression level they request and be confused about
why that has happened. It also seems possible that we'll be wrong
about how much memory will be used - say, because somebody changes the
library behavior in a new release - and will limit it to the wrong
level. If we're going to do anything here, I think it should be to
limit based on the compression level itself and not based how much
memory we think that level will use.
But that leaves the question of whether we should even try to impose
some kind of limit, and there I'm not sure. It feels like it might be
overengineered, because we're only talking about users who have
replication privileges, and if those accounts are subverted there are
big problems anyway. I think if we imposed a governance system here it
would get very little use. On the other hand, I think that the higher
zstd compression levels of 20+ can actually use a ton of memory, so we
might want to limit access to those somehow. Apparently on the command
line you have to say --ultra -- not sure if there's a corresponding
API call or if that's a guard that's built specifically into the CLI.
Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
one invocation? I recall that there's some differences between basebackup
client / serverside around buffer sizes - but that's before all the recent-ish
changes...
That thought occurred to me too but I haven't investigated yet.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
I think you should also test the return value when setting the compress level.
Not only because it's generally a good idea, but also because I suggested to
support negative compression levels. Which weren't allowed before v1.3.4, and
then the range is only defined since 1.3.6 (ZSTD_minCLevel). At some point,
the range may have been -7..22 but now it's -131072..22.
Hi,
The attached patch fixes a few goofs around backup compression. It
adds a check that setting the compression level succeeds, although it
does not allow the broader range of compression levels Justin notes
above. That can be done separately, I guess, if we want to do it. It
also fixes the problem that client and server-side zstd compression
don't actually compress equally well; that turned out to be a bug in
the handling of compression options. Finally it adds an exit call to
an unlikely failure case so that we would, if that case should occur,
print a message and exit, rather than the current behavior of printing
a message and then dereferencing a null pointer.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
0001-Fix-a-few-goofs-in-new-backup-compression-code.patchapplication/octet-stream; name=0001-Fix-a-few-goofs-in-new-backup-compression-code.patchDownload
From 7c04715c6f5410e3be4f62c29edef60401d721a9 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 24 Mar 2022 17:21:11 -0400
Subject: [PATCH] Fix a few goofs in new backup compression code.
When we try to set the zstd compression level either on the client
or on the server, check for errors.
For any algorithm, on the client side, don't try to set the compression
level unless the user specified one. This was visibly broken for
zstd, which managed to set -1 rather than 0 in this case, but tidy
up the code for the other methods, too.
On the client side, if we fail to create a ZSTD_CCtx, exit after
reporting the error. Otherwise we'll dereference a null pointer.
---
src/backend/replication/basebackup_zstd.c | 8 ++++++--
src/bin/pg_basebackup/bbstreamer_gzip.c | 3 ++-
src/bin/pg_basebackup/bbstreamer_lz4.c | 3 ++-
src/bin/pg_basebackup/bbstreamer_zstd.c | 19 +++++++++++++++++--
4 files changed, 27 insertions(+), 6 deletions(-)
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index bb5b668c2a..5496eaa72b 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -98,13 +98,17 @@ bbsink_zstd_begin_backup(bbsink *sink)
{
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
+ size_t ret;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
elog(ERROR, "could not create zstd compression context");
- ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
- mysink->compresslevel);
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+ if (ZSTD_isError(ret))
+ elog(ERROR, "could not set zstd compression level to %d: %s",
+ mysink->compresslevel, ZSTD_getErrorName(ret));
/*
* We need our own buffer, because we're going to pass different data to
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 1979e95639..760619fcd7 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -116,7 +116,8 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
}
}
- if (gzsetparams(streamer->gzfile, compress->level,
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0 &&
+ gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index a6ec317e2b..67f841d96a 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -89,7 +89,8 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compress->level;
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index caa5edcaf1..7946b6350b 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,6 +67,8 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
+ int compresslevel;
+ size_t ret;
Assert(next != NULL);
@@ -81,11 +83,24 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
streamer->cctx = ZSTD_createCCtx();
if (!streamer->cctx)
+ {
pg_log_error("could not create zstd compression context");
+ exit(1);
+ }
/* Initialize stream compression preferences */
- ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compress->level);
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ compresslevel = compress->level;
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compresslevel);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set zstd compression level to %d: %s",
+ compresslevel, ZSTD_getErrorName(ret));
+ exit(1);
+ }
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
--
2.24.3 (Apple Git-128)
Robert Haas <robertmhaas@gmail.com> writes:
[ v5-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch ]
Coverity has a nitpick about this:
/srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 in parse_bc_specification()
193 /* Advance to next entry and loop around. */
CID 1503251: Null pointer dereferences (REVERSE_INULL)
Null-checking "vend" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
194 specification = vend == NULL ? kwend + 1 : vend + 1;
195 }
196 }
Not sure if you should remove this null-check or add some other ones,
but I think you ought to do one or the other.
regards, tom lane
On Wed, Mar 23, 2022 at 06:57:04PM -0400, Robert Haas wrote:
On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Also because the library may not be compiled with threading. A few days ago, I
tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
patch but then couldn't test it, even after trying various versions of the zstd
package and trying to compile it locally. I'll try again soon...Ah. Right, I can update the comment to mention that.
Actually, I suggest to remove those comments:
| "We check for failure here because..."
That should be the rule rather than the exception, so shouldn't require
justifying why one might checks the return value of library and system calls.
In bbsink_zstd_new(), I think you need to check to see if workers were
requested (same as the issue you found with "level"). If someone builds
against a version of zstd which doesn't support some parameter, you'll
currently call SetParameter with that flag anyway, with a default value.
That's not currently breaking anything for me (even though workers=N doesn't
work) but I think it's fragile and could break, maybe when compiled against an
old zstd, or with future options. SetParameter should only be called when the
user requested to set the parameter. I handled that for workers in 003, but
didn't touch "level", which is probably fine, but maybe should change for
consistency.
src/backend/replication/basebackup_zstd.c: elog(ERROR, "could not set zstd compression level to %d: %s",
src/bin/pg_basebackup/bbstreamer_gzip.c: pg_log_error("could not set compression level %d: %s",
src/bin/pg_basebackup/bbstreamer_zstd.c: pg_log_error("could not set compression level to: %d: %s",
I'm not sure why these messages sometimes mention the current compression
method and sometimes don't. I suggest that they shouldn't - errcontext will
have the algorithm, and the user already specified it anyway. It'd allow the
compiler to merge strings.
Here's a patch for zstd --long mode. (I don't actually use pg_basebackup, but
I will want to use long mode with pg_dump). The "strategy" params may also be
interesting, but I haven't played with it. rsyncable is certainly interesting,
but currently an experimental, nonpublic interface - and a good example of why
to not call SetParameter for params which the user didn't specify: PGDG might
eventually compile postgres against a zstd which supports rsyncable flag. And
someone might install somewhere which doesn't support rsyncable, but the server
would try to call SetParameter(rsyncable, 0), and the rsyncable ID number
would've changed, so zstd would probably reject it, and basebackup would be
unusable...
$ time src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method=none --no-manifest -Z zstd:long=1 --checkpoint fast |wc -c
4625935
real 0m1,334s
$ time src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method=none --no-manifest -Z zstd:long=0 --checkpoint fast |wc -c
8426516
real 0m0,880s
Attachments:
0001-Fix-a-few-goofs-in-new-backup-compression-code.patchtext/x-diff; charset=us-asciiDownload
From e73af18e791f784b3853511f10fe9e573984bcf4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 24 Mar 2022 17:21:11 -0400
Subject: [PATCH 1/5] Fix a few goofs in new backup compression code.
When we try to set the zstd compression level either on the client
or on the server, check for errors.
For any algorithm, on the client side, don't try to set the compression
level unless the user specified one. This was visibly broken for
zstd, which managed to set -1 rather than 0 in this case, but tidy
up the code for the other methods, too.
On the client side, if we fail to create a ZSTD_CCtx, exit after
reporting the error. Otherwise we'll dereference a null pointer.
---
src/backend/replication/basebackup_zstd.c | 8 ++++++--
src/bin/pg_basebackup/bbstreamer_gzip.c | 3 ++-
src/bin/pg_basebackup/bbstreamer_lz4.c | 3 ++-
src/bin/pg_basebackup/bbstreamer_zstd.c | 19 +++++++++++++++++--
4 files changed, 27 insertions(+), 6 deletions(-)
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index bb5b668c2ab..5496eaa72b7 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -98,13 +98,17 @@ bbsink_zstd_begin_backup(bbsink *sink)
{
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
+ size_t ret;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
elog(ERROR, "could not create zstd compression context");
- ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
- mysink->compresslevel);
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ mysink->compresslevel);
+ if (ZSTD_isError(ret))
+ elog(ERROR, "could not set zstd compression level to %d: %s",
+ mysink->compresslevel, ZSTD_getErrorName(ret));
/*
* We need our own buffer, because we're going to pass different data to
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 1979e956399..760619fcd74 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -116,7 +116,8 @@ bbstreamer_gzip_writer_new(char *pathname, FILE *file,
}
}
- if (gzsetparams(streamer->gzfile, compress->level,
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0 &&
+ gzsetparams(streamer->gzfile, compress->level,
Z_DEFAULT_STRATEGY) != Z_OK)
{
pg_log_error("could not set compression level %d: %s",
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
index a6ec317e2bd..67f841d96a9 100644
--- a/src/bin/pg_basebackup/bbstreamer_lz4.c
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -89,7 +89,8 @@ bbstreamer_lz4_compressor_new(bbstreamer *next, bc_specification *compress)
prefs = &streamer->prefs;
memset(prefs, 0, sizeof(LZ4F_preferences_t));
prefs->frameInfo.blockSizeID = LZ4F_max256KB;
- prefs->compressionLevel = compress->level;
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ prefs->compressionLevel = compress->level;
/*
* Find out the compression bound, it specifies the minimum destination
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index caa5edcaf12..7946b6350b6 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,6 +67,8 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
+ int compresslevel;
+ size_t ret;
Assert(next != NULL);
@@ -81,11 +83,24 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
streamer->cctx = ZSTD_createCCtx();
if (!streamer->cctx)
+ {
pg_log_error("could not create zstd compression context");
+ exit(1);
+ }
/* Initialize stream compression preferences */
- ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compress->level);
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
+ compresslevel = 0;
+ else
+ compresslevel = compress->level;
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compresslevel);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set zstd compression level to %d: %s",
+ compresslevel, ZSTD_getErrorName(ret));
+ exit(1);
+ }
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
--
2.17.1
0002-Allow-parallel-zstd-compression-when-taking-a-base-b.patchtext/x-diff; charset=us-asciiDownload
From e59d9c1cdcf3f109267c12d4a28525f121c69720 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 23 Mar 2022 11:00:33 -0400
Subject: [PATCH 2/5] Allow parallel zstd compression when taking a base
backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.
When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.
Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe.
---
doc/src/sgml/protocol.sgml | 12 +++++--
doc/src/sgml/ref/pg_basebackup.sgml | 4 +--
src/backend/replication/basebackup_zstd.c | 18 ++++++++++
src/bin/pg_basebackup/bbstreamer_zstd.c | 15 +++++++++
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 5 +++
src/bin/pg_verifybackup/t/009_extract.pl | 29 ++++++++++++++--
src/bin/pg_verifybackup/t/010_client_untar.pl | 33 +++++++++++++++++--
src/common/backup_compression.c | 16 +++++++++
src/include/common/backup_compression.h | 2 ++
src/test/perl/PostgreSQL/Test/Cluster.pm | 3 +-
10 files changed, 125 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2fa3cedfe9e..98f0bc3cc34 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
option. If the value is an integer, it specifies the compression
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
- <literal>keyword=value</literal>. Currently, the only supported
- keyword is <literal>level</literal>, which sets the compression
- level.
+ <literal>keyword=value</literal>. Currently, the supported keywords
+ are <literal>level</literal> and <literal>workers</literal>.
</para>
<para>
+ The <literal>level</literal> keyword sets the compression level.
For <literal>gzip</literal> the compression level should be an
integer between 1 and 9, for <literal>lz4</literal> an integer
between 1 and 12, and for <literal>zstd</literal> an integer
between 1 and 22.
</para>
+
+ <para>
+ The <literal>workers</literal> keyword sets the number of threads
+ that should be used for parallel compression. Parallel compression
+ is supported only for <literal>zstd</literal>.
+ </para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e1..82f5f606250 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the only supported keyword is <literal>level</literal>,
- which sets the compression level.
+ Currently, the supported keywords are <literal>level</literal>
+ and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 5496eaa72b7..d6eb0617d8a 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -28,6 +28,9 @@ typedef struct bbsink_zstd
/* Compression level */
int compresslevel;
+ /* Number of parallel workers. */
+ int workers;
+
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
} bbsink_zstd;
@@ -83,6 +86,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
sink->compresslevel = compresslevel;
+ sink->workers = compress->workers;
return &sink->base;
#endif
@@ -110,6 +114,20 @@ bbsink_zstd_begin_backup(bbsink *sink)
elog(ERROR, "could not set zstd compression level to %d: %s",
mysink->compresslevel, ZSTD_getErrorName(ret));
+ /*
+ * We check for failure here because (1) older versions of the library
+ * do not support ZSTD_c_nbWorkers and (2) the library might want to
+ * reject an unreasonable values (though in practice it does not seem to do
+ * so).
+ */
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ mysink->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ mysink->workers, ZSTD_getErrorName(ret)));
+
/*
* We need our own buffer, because we're going to pass different data to
* the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 7946b6350b6..20393da595b 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -102,6 +102,21 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
exit(1);
}
+ /*
+ * We check for failure here because (1) older versions of the library
+ * do not support ZSTD_c_nbWorkers and (2) the library might want to
+ * reject unreasonable values (though in practice it does not seem to do
+ * so).
+ */
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret));
+ exit(1);
+ }
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 47f3d00ac45..5ba84c22509 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -130,6 +130,11 @@ my @compression_failure_tests = (
'invalid compression specification: found empty string where a compression option was expected',
'failure on extra, empty compression option'
],
+ [
+ 'gzip:workers=3',
+ 'invalid compression specification: compression algorithm "gzip" does not accept a worker count',
+ 'failure on worker count for gzip'
+ ],
);
for my $cft (@compression_failure_tests)
{
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 41a5b370cc5..d6f11b95535 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -34,6 +34,12 @@ my @test_configuration = (
'compression_method' => 'zstd',
'backup_flags' => ['--compress', 'server-zstd:5'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:workers=3'],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -55,8 +61,27 @@ for my $tc (@test_configuration)
my @verify = ('pg_verifybackup', '-e', $backup_path);
# A backup with a valid compression method should work.
- $primary->command_ok(\@backup,
- "backup done, compression method \"$method\"");
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 2;
+ }
+ else
+ {
+ ok($backup_result, "backup done, compression $method");
+ }
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 488a6d1edee..c1cd12cb065 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -49,6 +49,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:workers=3'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -69,9 +78,27 @@ for my $tc (@test_configuration)
'pg_basebackup', '-D', $backup_path,
'-Xfetch', '--no-sync', '-cfast', '-Ft');
push @backup, @{$tc->{'backup_flags'}};
- $primary->command_ok(\@backup,
- "client side backup, compression $method");
-
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 3;
+ }
+ else
+ {
+ ok($backup_result, "client side backup, compression $method");
+ }
# Verify that the we got the files we expected.
my $backup_files = join(',',
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c44..969e08cca20 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->level = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
}
+ else if (strcmp(keyword, "workers") == 0)
+ {
+ result->workers = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -266,5 +271,16 @@ validate_bc_specification(bc_specification *spec)
min_level, max_level);
}
+ /*
+ * Of the compression algorithms that we currently support, only zstd
+ * allows parallel workers.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0 &&
+ (spec->algorithm != BACKUP_COMPRESSION_ZSTD))
+ {
+ return psprintf(_("compression algorithm \"%s\" does not accept a worker count"),
+ get_bc_algorithm_name(spec->algorithm));
+ }
+
return NULL;
}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 0565cbc657d..6a0ecaa99c9 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -23,12 +23,14 @@ typedef enum bc_algorithm
} bc_algorithm;
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
typedef struct bc_specification
{
bc_algorithm algorithm;
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
+ int workers;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index bee6aacf47c..b6e33516110 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2502,8 +2502,7 @@ sub run_log
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::run_log(@_);
- return;
+ return PostgreSQL::Test::Utils::run_log(@_);
}
=pod
--
2.17.1
0003-f-workers.patchtext/x-diff; charset=us-asciiDownload
From b977f6ba8e491145165b9ab9f2f1bd407b4e2d26 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 27 Mar 2022 12:28:32 -0500
Subject: [PATCH 3/5] f!workers
---
src/backend/replication/basebackup_zstd.c | 31 +++++++++++++----------
src/bin/pg_basebackup/bbstreamer_zstd.c | 21 +++++++--------
2 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index d6eb0617d8a..a112d6e181e 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -71,6 +71,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
#else
bbsink_zstd *sink;
int compresslevel;
+ int workers;
Assert(next != NULL);
@@ -82,11 +83,16 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
Assert(compresslevel >= 1 && compresslevel <= 22);
}
+ if (compress->options & BACKUP_COMPRESSION_OPTION_WORKERS)
+ workers = compress->workers;
+ else
+ workers = 0;
+
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
sink->compresslevel = compresslevel;
- sink->workers = compress->workers;
+ sink->workers = workers;
return &sink->base;
#endif
@@ -114,19 +120,16 @@ bbsink_zstd_begin_backup(bbsink *sink)
elog(ERROR, "could not set zstd compression level to %d: %s",
mysink->compresslevel, ZSTD_getErrorName(ret));
- /*
- * We check for failure here because (1) older versions of the library
- * do not support ZSTD_c_nbWorkers and (2) the library might want to
- * reject an unreasonable values (though in practice it does not seem to do
- * so).
- */
- ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
- mysink->workers);
- if (ZSTD_isError(ret))
- ereport(ERROR,
- errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("could not set compression worker count to %d: %s",
- mysink->workers, ZSTD_getErrorName(ret)));
+ if (mysink->workers > 0)
+ {
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ mysink->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ mysink->workers, ZSTD_getErrorName(ret)));
+ }
/*
* We need our own buffer, because we're going to pass different data to
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 20393da595b..678af73e6f0 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -102,19 +102,16 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
exit(1);
}
- /*
- * We check for failure here because (1) older versions of the library
- * do not support ZSTD_c_nbWorkers and (2) the library might want to
- * reject unreasonable values (though in practice it does not seem to do
- * so).
- */
- ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
- compress->workers);
- if (ZSTD_isError(ret))
+ if (compress->workers > 0)
{
- pg_log_error("could not set compression worker count to %d: %s",
- compress->workers, ZSTD_getErrorName(ret));
- exit(1);
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret));
+ exit(1);
+ }
}
/* Initialize the ZSTD output buffer. */
--
2.17.1
0004-basebackup-support-Z-zstd-long.patchtext/x-diff; charset=us-asciiDownload
From 74124b8d69e5fbe632fd51bff0effec81ebdc806 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 27 Mar 2022 11:55:01 -0500
Subject: [PATCH 4/5] basebackup: support -Z zstd:long
---
doc/src/sgml/protocol.sgml | 10 +++++++++-
doc/src/sgml/ref/pg_basebackup.sgml | 4 ++--
src/backend/replication/basebackup_zstd.c | 21 +++++++++++++++++++++
src/bin/pg_basebackup/bbstreamer_zstd.c | 13 +++++++++++++
src/common/backup_compression.c | 5 +++++
src/include/common/backup_compression.h | 2 ++
6 files changed, 52 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 98f0bc3cc34..80f1a1f9a04 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2740,7 +2740,8 @@ The commands accepted in replication mode are:
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
<literal>keyword=value</literal>. Currently, the supported keywords
- are <literal>level</literal> and <literal>workers</literal>.
+ are <literal>level</literal>, <literal>long</literal>, and
+ <literal>workers</literal>.
</para>
<para>
@@ -2751,6 +2752,13 @@ The commands accepted in replication mode are:
between 1 and 22.
</para>
+ <para>
+ The <literal>long</literal> keyword enables long-distance matching
+ mode, for improved compression ratio, at the expense of higher memory
+ use. Long-distance mode is supported only for
+ <literal>zstd</literal>.
+ </para>
+
<para>
The <literal>workers</literal> keyword sets the number of threads
that should be used for parallel compression. Parallel compression
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 82f5f606250..014c454bfab 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the supported keywords are <literal>level</literal>
- and <literal>workers</literal>.
+ Currently, the supported keywords are <literal>level</literal>,
+ <literal>long</literal>, and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index a112d6e181e..b900604f59f 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -31,6 +31,9 @@ typedef struct bbsink_zstd
/* Number of parallel workers. */
int workers;
+ /* Flags */
+ bool zstd_long;
+
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
} bbsink_zstd;
@@ -72,6 +75,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
bbsink_zstd *sink;
int compresslevel;
int workers;
+ bool zstd_long;
Assert(next != NULL);
@@ -88,11 +92,15 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
else
workers = 0;
+ zstd_long = (compress->options & BACKUP_COMPRESSION_OPTION_ZSTD_LONG) ?
+ compress->zstd_long : false;
+
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
sink->compresslevel = compresslevel;
sink->workers = workers;
+ sink->zstd_long = zstd_long;
return &sink->base;
#endif
@@ -131,6 +139,19 @@ bbsink_zstd_begin_backup(bbsink *sink)
mysink->workers, ZSTD_getErrorName(ret)));
}
+ if (mysink->zstd_long)
+ {
+ ret = ZSTD_CCtx_setParameter(mysink->cctx,
+ ZSTD_c_enableLongDistanceMatching,
+ mysink->zstd_long);
+ fprintf(stderr, "setting LDM %d\n", ret);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression flag for %s: %s",
+ "long", ZSTD_getErrorName(ret)));
+ }
+
/*
* We need our own buffer, because we're going to pass different data to
* the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 678af73e6f0..3c7396a1373 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -114,6 +114,19 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
}
}
+ if (compress->zstd_long)
+ {
+ ret = ZSTD_CCtx_setParameter(streamer->cctx,
+ ZSTD_c_enableLongDistanceMatching,
+ compress->zstd_long);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression flag for %s: %s",
+ "long", ZSTD_getErrorName(ret));
+ exit(1);
+ }
+ }
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 969e08cca20..f43a5608e65 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -182,6 +182,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->workers = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
}
+ else if (strcmp(keyword, "long") == 0)
+ {
+ result->zstd_long = expect_integer_value(keyword, value, result); // XXX: expect_bool?
+ result->options |= BACKUP_COMPRESSION_OPTION_ZSTD_LONG;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 6a0ecaa99c9..a378631a8da 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -24,6 +24,7 @@ typedef enum bc_algorithm
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
+#define BACKUP_COMPRESSION_OPTION_ZSTD_LONG (1 << 2)
typedef struct bc_specification
{
@@ -31,6 +32,7 @@ typedef struct bc_specification
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
int workers;
+ int zstd_long;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
--
2.17.1
0005-pg_basebackup-support-Zstd-negative-compression-leve.patchtext/x-diff; charset=us-asciiDownload
From 28c7236534634498265c3e4d6544c836052f009f Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH 5/5] pg_basebackup: support Zstd negative compression levels
"higher than maximum" is bogus
TODO: each compression methods should enforce its own levels
---
src/backend/replication/basebackup_zstd.c | 2 +-
src/bin/pg_basebackup/bbstreamer_zstd.c | 16 +++++++---------
src/common/backup_compression.c | 6 +++++-
3 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index b900604f59f..e18535bcc13 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -84,7 +84,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
else
{
compresslevel = compress->level;
- Assert(compresslevel >= 1 && compresslevel <= 22);
+ Assert(compresslevel >= -7 && compresslevel <= 22 && compresslevel != 0);
}
if (compress->options & BACKUP_COMPRESSION_OPTION_WORKERS)
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 3c7396a1373..31fbf2d0bc3 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -89,16 +89,14 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
}
/* Initialize stream compression preferences */
- if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
- compresslevel = 0;
- else
- compresslevel = compress->level;
- ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
- if (ZSTD_isError(ret))
+
+ if (compress->options & BACKUP_COMPRESSION_OPTION_LEVEL)
{
- pg_log_error("could not set zstd compression level to %d: %s",
- compresslevel, ZSTD_getErrorName(ret));
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compress->level);
+ if (ZSTD_isError(ret))
+ pg_log_error("could not set compression level to: %d: %s",
+ compress->level, ZSTD_getErrorName(ret));
exit(1);
}
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index f43a5608e65..b568eccd65f 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -265,13 +265,17 @@ validate_bc_specification(bc_specification *spec)
else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
max_level = 12;
else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ {
max_level = 22;
+ /* The minimum level depends on the version.. */
+ min_level = -7;
+ }
else
return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
get_bc_algorithm_name(spec->algorithm));
if (spec->level < min_level || spec->level > max_level)
- return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+ return psprintf(_("compression algorithm \"%s\" expects a nonzero compression level between %d and %d"),
get_bc_algorithm_name(spec->algorithm),
min_level, max_level);
}
--
2.17.1
On Fri, Mar 25, 2022 at 9:23 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
The changes look good to me.
Thanks. Committed.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 24, 2022 at 9:19 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
Per the TAP protocol, every line of non-test-result output should be
prefixed by "# ". The note() function does this for you, see
https://metacpan.org/pod/Test::More#Diagnostics for details.
True, but that also means it shows up in the actual failure message,
which seems too verbose. By just using 'print', it ends up in the log
file if it's needed, but not anywhere else. Maybe there's a better way
to do this, but I don't think using note() is what I want.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Mar 24, 2022 at 9:19 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:Per the TAP protocol, every line of non-test-result output should be
prefixed by "# ". The note() function does this for you, see
https://metacpan.org/pod/Test::More#Diagnostics for details.True, but that also means it shows up in the actual failure message,
which seems too verbose. By just using 'print', it ends up in the log
file if it's needed, but not anywhere else. Maybe there's a better way
to do this, but I don't think using note() is what I want.
That is the difference between note() and diag(): note() prints to
stdout so is not visible under a non-verbose prove run, while diag()
prints to stderr so it's always visible.
- ilmari
On Mon, Mar 28, 2022 at 12:52 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
True, but that also means it shows up in the actual failure message,
which seems too verbose. By just using 'print', it ends up in the log
file if it's needed, but not anywhere else. Maybe there's a better way
to do this, but I don't think using note() is what I want.That is the difference between note() and diag(): note() prints to
stdout so is not visible under a non-verbose prove run, while diag()
prints to stderr so it's always visible.
OK, but print doesn't do either of those things. The output only shows
up in the log file, even with --verbose. Here's an example of what the
log file looks like:
# Running: pg_verifybackup -n -m
/Users/rhaas/pgsql/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup/backup_manifest
-e /Users/rhaas/pgsql/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/extracted-backup
backup successfully verified
ok 6 - verify backup, compression gzip
As you can see, there is a line here that does not begin with #. That
line is the standard output of a command that was run by the test
script.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sun, Mar 27, 2022 at 4:50 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Actually, I suggest to remove those comments:
| "We check for failure here because..."That should be the rule rather than the exception, so shouldn't require
justifying why one might checks the return value of library and system calls.
I went for modifying the comment rather than removing it. I agree with
you that checking for failure doesn't really require justification,
but I think that in a case like this it is useful to explain what we
know about why it might fail.
In bbsink_zstd_new(), I think you need to check to see if workers were
requested (same as the issue you found with "level").
Fixed.
src/backend/replication/basebackup_zstd.c: elog(ERROR, "could not set zstd compression level to %d: %s",
src/bin/pg_basebackup/bbstreamer_gzip.c: pg_log_error("could not set compression level %d: %s",
src/bin/pg_basebackup/bbstreamer_zstd.c: pg_log_error("could not set compression level to: %d: %s",I'm not sure why these messages sometimes mention the current compression
method and sometimes don't. I suggest that they shouldn't - errcontext will
have the algorithm, and the user already specified it anyway. It'd allow the
compiler to merge strings.
I don't think that errcontext() helps here. On the client side, it
doesn't exist. On the server side, it's not in use. I do see
STATEMENT: <whatever> in the server log when a replication command
throws a server-side error, which is similar, but pg_basebackup
doesn't display that STATEMENT line. I don't really know how to
balance the legitimate desire for future messages against the
also-legitimate desire for clarity about where things are failing. I'm
slightly inclined to think that including the algorithm name is
better, because options are in the end algorithm-specific, but it's
certainly debatable. I would be interested in hearing other
opinions...
Here's an updated and rebased version of my patch.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v2-0001-Allow-parallel-zstd-compression-when-taking-a-bas.patchapplication/octet-stream; name=v2-0001-Allow-parallel-zstd-compression-when-taking-a-bas.patchDownload
From 473e410a7625fe3fb84a34eab594a84fd40bd2a7 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 23 Mar 2022 11:00:33 -0400
Subject: [PATCH v2] Allow parallel zstd compression when taking a base backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.
When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.
Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe.
---
doc/src/sgml/protocol.sgml | 12 +++++--
doc/src/sgml/ref/pg_basebackup.sgml | 4 +--
src/backend/replication/basebackup_zstd.c | 18 ++++++++++
src/bin/pg_basebackup/bbstreamer_zstd.c | 17 ++++++++++
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 5 +++
src/bin/pg_verifybackup/t/009_extract.pl | 29 ++++++++++++++--
src/bin/pg_verifybackup/t/010_client_untar.pl | 33 +++++++++++++++++--
src/common/backup_compression.c | 16 +++++++++
src/include/common/backup_compression.h | 2 ++
src/test/perl/PostgreSQL/Test/Cluster.pm | 3 +-
10 files changed, 127 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2fa3cedfe9..98f0bc3cc3 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
option. If the value is an integer, it specifies the compression
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
- <literal>keyword=value</literal>. Currently, the only supported
- keyword is <literal>level</literal>, which sets the compression
- level.
+ <literal>keyword=value</literal>. Currently, the supported keywords
+ are <literal>level</literal> and <literal>workers</literal>.
</para>
<para>
+ The <literal>level</literal> keyword sets the compression level.
For <literal>gzip</literal> the compression level should be an
integer between 1 and 9, for <literal>lz4</literal> an integer
between 1 and 12, and for <literal>zstd</literal> an integer
between 1 and 22.
</para>
+
+ <para>
+ The <literal>workers</literal> keyword sets the number of threads
+ that should be used for parallel compression. Parallel compression
+ is supported only for <literal>zstd</literal>.
+ </para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e..82f5f60625 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the only supported keyword is <literal>level</literal>,
- which sets the compression level.
+ Currently, the supported keywords are <literal>level</literal>
+ and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 5496eaa72b..d6eb0617d8 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -28,6 +28,9 @@ typedef struct bbsink_zstd
/* Compression level */
int compresslevel;
+ /* Number of parallel workers. */
+ int workers;
+
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
} bbsink_zstd;
@@ -83,6 +86,7 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
sink->compresslevel = compresslevel;
+ sink->workers = compress->workers;
return &sink->base;
#endif
@@ -110,6 +114,20 @@ bbsink_zstd_begin_backup(bbsink *sink)
elog(ERROR, "could not set zstd compression level to %d: %s",
mysink->compresslevel, ZSTD_getErrorName(ret));
+ /*
+ * We check for failure here because (1) older versions of the library
+ * do not support ZSTD_c_nbWorkers and (2) the library might want to
+ * reject an unreasonable values (though in practice it does not seem to do
+ * so).
+ */
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ mysink->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ mysink->workers, ZSTD_getErrorName(ret)));
+
/*
* We need our own buffer, because we're going to pass different data to
* the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 7946b6350b..50bae5d4be 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -102,6 +102,23 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
exit(1);
}
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+ {
+ /*
+ * On older versions of libzstd, this option does not exist, and
+ * trying to set it will fail. Similarly for newer versions if they
+ * are compiled without threading support.
+ */
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret));
+ exit(1);
+ }
+ }
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 47f3d00ac4..5ba84c2250 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -130,6 +130,11 @@ my @compression_failure_tests = (
'invalid compression specification: found empty string where a compression option was expected',
'failure on extra, empty compression option'
],
+ [
+ 'gzip:workers=3',
+ 'invalid compression specification: compression algorithm "gzip" does not accept a worker count',
+ 'failure on worker count for gzip'
+ ],
);
for my $cft (@compression_failure_tests)
{
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 41a5b370cc..d6f11b9553 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -34,6 +34,12 @@ my @test_configuration = (
'compression_method' => 'zstd',
'backup_flags' => ['--compress', 'server-zstd:5'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:workers=3'],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -55,8 +61,27 @@ for my $tc (@test_configuration)
my @verify = ('pg_verifybackup', '-e', $backup_path);
# A backup with a valid compression method should work.
- $primary->command_ok(\@backup,
- "backup done, compression method \"$method\"");
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 2;
+ }
+ else
+ {
+ ok($backup_result, "backup done, compression $method");
+ }
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 488a6d1ede..c1cd12cb06 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -49,6 +49,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:workers=3'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -69,9 +78,27 @@ for my $tc (@test_configuration)
'pg_basebackup', '-D', $backup_path,
'-Xfetch', '--no-sync', '-cfast', '-Ft');
push @backup, @{$tc->{'backup_flags'}};
- $primary->command_ok(\@backup,
- "client side backup, compression $method");
-
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 3;
+ }
+ else
+ {
+ ok($backup_result, "client side backup, compression $method");
+ }
# Verify that the we got the files we expected.
my $backup_files = join(',',
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c4..969e08cca2 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->level = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
}
+ else if (strcmp(keyword, "workers") == 0)
+ {
+ result->workers = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -266,5 +271,16 @@ validate_bc_specification(bc_specification *spec)
min_level, max_level);
}
+ /*
+ * Of the compression algorithms that we currently support, only zstd
+ * allows parallel workers.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0 &&
+ (spec->algorithm != BACKUP_COMPRESSION_ZSTD))
+ {
+ return psprintf(_("compression algorithm \"%s\" does not accept a worker count"),
+ get_bc_algorithm_name(spec->algorithm));
+ }
+
return NULL;
}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 0565cbc657..6a0ecaa99c 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -23,12 +23,14 @@ typedef enum bc_algorithm
} bc_algorithm;
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
typedef struct bc_specification
{
bc_algorithm algorithm;
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
+ int workers;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index bee6aacf47..b6e3351611 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2502,8 +2502,7 @@ sub run_log
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::run_log(@_);
- return;
+ return PostgreSQL::Test::Utils::run_log(@_);
}
=pod
--
2.24.3 (Apple Git-128)
On Mon, Mar 28, 2022 at 12:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
Here's an updated and rebased version of my patch.
Well, that only updated the comment on the client side. Let's try again.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v3-0001-Allow-parallel-zstd-compression-when-taking-a-bas.patchapplication/octet-stream; name=v3-0001-Allow-parallel-zstd-compression-when-taking-a-bas.patchDownload
From 29ae6c4909e0c3ce3f66f869f06278fb109749f4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 28 Mar 2022 13:25:44 -0400
Subject: [PATCH v3] Allow parallel zstd compression when taking a base backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.
When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.
Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.
---
doc/src/sgml/protocol.sgml | 12 +++--
doc/src/sgml/ref/pg_basebackup.sgml | 4 +-
src/backend/replication/basebackup_zstd.c | 45 ++++++++++++-------
src/bin/pg_basebackup/bbstreamer_zstd.c | 40 ++++++++++++-----
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 5 +++
src/bin/pg_verifybackup/t/009_extract.pl | 29 +++++++++++-
src/bin/pg_verifybackup/t/010_client_untar.pl | 33 ++++++++++++--
src/common/backup_compression.c | 16 +++++++
src/include/common/backup_compression.h | 2 +
src/test/perl/PostgreSQL/Test/Cluster.pm | 3 +-
10 files changed, 148 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2fa3cedfe9..98f0bc3cc3 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
option. If the value is an integer, it specifies the compression
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
- <literal>keyword=value</literal>. Currently, the only supported
- keyword is <literal>level</literal>, which sets the compression
- level.
+ <literal>keyword=value</literal>. Currently, the supported keywords
+ are <literal>level</literal> and <literal>workers</literal>.
</para>
<para>
+ The <literal>level</literal> keyword sets the compression level.
For <literal>gzip</literal> the compression level should be an
integer between 1 and 9, for <literal>lz4</literal> an integer
between 1 and 12, and for <literal>zstd</literal> an integer
between 1 and 22.
</para>
+
+ <para>
+ The <literal>workers</literal> keyword sets the number of threads
+ that should be used for parallel compression. Parallel compression
+ is supported only for <literal>zstd</literal>.
+ </para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e..82f5f60625 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the only supported keyword is <literal>level</literal>,
- which sets the compression level.
+ Currently, the supported keywords are <literal>level</literal>
+ and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 5496eaa72b..f6876f4811 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -25,8 +25,8 @@ typedef struct bbsink_zstd
/* Common information for all types of sink. */
bbsink base;
- /* Compression level */
- int compresslevel;
+ /* Compression options */
+ bc_specification *compress;
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
@@ -67,22 +67,13 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
return NULL; /* keep compiler quiet */
#else
bbsink_zstd *sink;
- int compresslevel;
Assert(next != NULL);
- if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
- compresslevel = 0;
- else
- {
- compresslevel = compress->level;
- Assert(compresslevel >= 1 && compresslevel <= 22);
- }
-
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
- sink->compresslevel = compresslevel;
+ sink->compress = compress;
return &sink->base;
#endif
@@ -99,16 +90,36 @@ bbsink_zstd_begin_backup(bbsink *sink)
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
size_t ret;
+ bc_specification *compress = mysink->compress;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
elog(ERROR, "could not create zstd compression context");
- ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
- mysink->compresslevel);
- if (ZSTD_isError(ret))
- elog(ERROR, "could not set zstd compression level to %d: %s",
- mysink->compresslevel, ZSTD_getErrorName(ret));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ compress->level);
+ if (ZSTD_isError(ret))
+ elog(ERROR, "could not set zstd compression level to %d: %s",
+ compress->level, ZSTD_getErrorName(ret));
+ }
+
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+ {
+ /*
+ * On older versions of libzstd, this option does not exist, and trying
+ * to set it will fail. Similarly for newer versions if they are
+ * compiled without threading support.
+ */
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret)));
+ }
/*
* We need our own buffer, because we're going to pass different data to
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 7946b6350b..f94c5c041d 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,7 +67,6 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
- int compresslevel;
size_t ret;
Assert(next != NULL);
@@ -88,18 +87,35 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
exit(1);
}
- /* Initialize stream compression preferences */
- if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
- compresslevel = 0;
- else
- compresslevel = compress->level;
- ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
- if (ZSTD_isError(ret))
+ /* Set compression level, if specified */
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
{
- pg_log_error("could not set zstd compression level to %d: %s",
- compresslevel, ZSTD_getErrorName(ret));
- exit(1);
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compress->level);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set zstd compression level to %d: %s",
+ compress->level, ZSTD_getErrorName(ret));
+ exit(1);
+ }
+ }
+
+ /* Set # of workers, if specified */
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+ {
+ /*
+ * On older versions of libzstd, this option does not exist, and
+ * trying to set it will fail. Similarly for newer versions if they
+ * are compiled without threading support.
+ */
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret));
+ exit(1);
+ }
}
/* Initialize the ZSTD output buffer. */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 47f3d00ac4..5ba84c2250 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -130,6 +130,11 @@ my @compression_failure_tests = (
'invalid compression specification: found empty string where a compression option was expected',
'failure on extra, empty compression option'
],
+ [
+ 'gzip:workers=3',
+ 'invalid compression specification: compression algorithm "gzip" does not accept a worker count',
+ 'failure on worker count for gzip'
+ ],
);
for my $cft (@compression_failure_tests)
{
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 41a5b370cc..d6f11b9553 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -34,6 +34,12 @@ my @test_configuration = (
'compression_method' => 'zstd',
'backup_flags' => ['--compress', 'server-zstd:5'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:workers=3'],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -55,8 +61,27 @@ for my $tc (@test_configuration)
my @verify = ('pg_verifybackup', '-e', $backup_path);
# A backup with a valid compression method should work.
- $primary->command_ok(\@backup,
- "backup done, compression method \"$method\"");
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 2;
+ }
+ else
+ {
+ ok($backup_result, "backup done, compression $method");
+ }
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 488a6d1ede..c1cd12cb06 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -49,6 +49,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:workers=3'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -69,9 +78,27 @@ for my $tc (@test_configuration)
'pg_basebackup', '-D', $backup_path,
'-Xfetch', '--no-sync', '-cfast', '-Ft');
push @backup, @{$tc->{'backup_flags'}};
- $primary->command_ok(\@backup,
- "client side backup, compression $method");
-
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 3;
+ }
+ else
+ {
+ ok($backup_result, "client side backup, compression $method");
+ }
# Verify that the we got the files we expected.
my $backup_files = join(',',
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c4..969e08cca2 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->level = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
}
+ else if (strcmp(keyword, "workers") == 0)
+ {
+ result->workers = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -266,5 +271,16 @@ validate_bc_specification(bc_specification *spec)
min_level, max_level);
}
+ /*
+ * Of the compression algorithms that we currently support, only zstd
+ * allows parallel workers.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0 &&
+ (spec->algorithm != BACKUP_COMPRESSION_ZSTD))
+ {
+ return psprintf(_("compression algorithm \"%s\" does not accept a worker count"),
+ get_bc_algorithm_name(spec->algorithm));
+ }
+
return NULL;
}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 0565cbc657..6a0ecaa99c 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -23,12 +23,14 @@ typedef enum bc_algorithm
} bc_algorithm;
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
typedef struct bc_specification
{
bc_algorithm algorithm;
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
+ int workers;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index bee6aacf47..b6e3351611 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2502,8 +2502,7 @@ sub run_log
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::run_log(@_);
- return;
+ return PostgreSQL::Test::Utils::run_log(@_);
}
=pod
--
2.24.3 (Apple Git-128)
On Sun, Mar 27, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Coverity has a nitpick about this:
/srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 in parse_bc_specification()
193 /* Advance to next entry and loop around. */CID 1503251: Null pointer dereferences (REVERSE_INULL)
Null-checking "vend" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.194 specification = vend == NULL ? kwend + 1 : vend + 1;
195 }
196 }Not sure if you should remove this null-check or add some other ones,
but I think you ought to do one or the other.
Yes, I think this is buggy. I think there's only a theoretical bug
right now, because the only keyword we have is "level" and that
requires a value. But if I add an example keyword that does not
require an associated value (as demonstrated in the attached patch)
and do something like pg_basebackup -cfast -D whatever --compress
lz4:example, then the present code will dereference "vend" even though
it's NULL, which is not good. The attached patch also shows how I
think that should be fixed.
As I hope is apparent, the first hunk of this patch is not for commit,
and the second hunk is for commit.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
coverity-backup-compression-fix.patchapplication/octet-stream; name=coverity-backup-compression-fix.patchDownload
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c4..8eb670848b 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,10 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->level = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
}
+ else if (strcmp(keyword, "example") == 0)
+ {
+ /* this is just an example */
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -187,7 +191,8 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
pfree(value);
/* If we got an error or have reached the end of the string, stop. */
- if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ if (result->parse_error != NULL ||
+ (vend == NULL ? *kwend == '\0' : *vend == '\0'))
break;
/* Advance to next entry and loop around. */
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Mar 27, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Not sure if you should remove this null-check or add some other ones,
but I think you ought to do one or the other.
As I hope is apparent, the first hunk of this patch is not for commit,
and the second hunk is for commit.
Looks plausible to me.
regards, tom lane
On Mon, Mar 28, 2022 at 03:50:50PM -0400, Robert Haas wrote:
On Sun, Mar 27, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Coverity has a nitpick about this:
/srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 in parse_bc_specification()
193 /* Advance to next entry and loop around. */CID 1503251: Null pointer dereferences (REVERSE_INULL)
Null-checking "vend" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.194 specification = vend == NULL ? kwend + 1 : vend + 1;
195 }
196 }Not sure if you should remove this null-check or add some other ones,
but I think you ought to do one or the other.Yes, I think this is buggy. I think there's only a theoretical bug
right now, because the only keyword we have is "level" and that
requires a value. But if I add an example keyword that does not
require an associated value (as demonstrated in the attached patch)
and do something like pg_basebackup -cfast -D whatever --compress
lz4:example, then the present code will dereference "vend" even though
it's NULL, which is not good. The attached patch also shows how I
think that should be fixed.As I hope is apparent, the first hunk of this patch is not for commit,
and the second hunk is for commit.
Confirmed that it's a real issue with my patch for zstd long match mode. But
you need to specify another option after the value-less flag option for it to
crash.
I suggest to write it differently, as in 0002.
This also fixes some rebase-induced errors with my previous patches, and adds
expect_boolean().
Attachments:
0001-Allow-parallel-zstd-compression-when-taking-a-base-b.patchtext/x-diff; charset=us-asciiDownload
From 9bedbfc6bfa471473a8b3479ffd1888d5da285ab Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 28 Mar 2022 13:25:44 -0400
Subject: [PATCH 1/4] Allow parallel zstd compression when taking a base
backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.
When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.
Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.
---
doc/src/sgml/protocol.sgml | 12 +++--
doc/src/sgml/ref/pg_basebackup.sgml | 4 +-
src/backend/replication/basebackup_zstd.c | 45 ++++++++++++-------
src/bin/pg_basebackup/bbstreamer_zstd.c | 40 ++++++++++++-----
src/bin/pg_basebackup/t/010_pg_basebackup.pl | 5 +++
src/bin/pg_verifybackup/t/009_extract.pl | 29 +++++++++++-
src/bin/pg_verifybackup/t/010_client_untar.pl | 33 ++++++++++++--
src/common/backup_compression.c | 16 +++++++
src/include/common/backup_compression.h | 2 +
src/test/perl/PostgreSQL/Test/Cluster.pm | 3 +-
10 files changed, 148 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2fa3cedfe9e..98f0bc3cc34 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
option. If the value is an integer, it specifies the compression
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
- <literal>keyword=value</literal>. Currently, the only supported
- keyword is <literal>level</literal>, which sets the compression
- level.
+ <literal>keyword=value</literal>. Currently, the supported keywords
+ are <literal>level</literal> and <literal>workers</literal>.
</para>
<para>
+ The <literal>level</literal> keyword sets the compression level.
For <literal>gzip</literal> the compression level should be an
integer between 1 and 9, for <literal>lz4</literal> an integer
between 1 and 12, and for <literal>zstd</literal> an integer
between 1 and 22.
</para>
+
+ <para>
+ The <literal>workers</literal> keyword sets the number of threads
+ that should be used for parallel compression. Parallel compression
+ is supported only for <literal>zstd</literal>.
+ </para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e1..82f5f606250 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the only supported keyword is <literal>level</literal>,
- which sets the compression level.
+ Currently, the supported keywords are <literal>level</literal>
+ and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 5496eaa72b7..f6876f48118 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -25,8 +25,8 @@ typedef struct bbsink_zstd
/* Common information for all types of sink. */
bbsink base;
- /* Compression level */
- int compresslevel;
+ /* Compression options */
+ bc_specification *compress;
ZSTD_CCtx *cctx;
ZSTD_outBuffer zstd_outBuf;
@@ -67,22 +67,13 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
return NULL; /* keep compiler quiet */
#else
bbsink_zstd *sink;
- int compresslevel;
Assert(next != NULL);
- if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
- compresslevel = 0;
- else
- {
- compresslevel = compress->level;
- Assert(compresslevel >= 1 && compresslevel <= 22);
- }
-
sink = palloc0(sizeof(bbsink_zstd));
*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
sink->base.bbs_next = next;
- sink->compresslevel = compresslevel;
+ sink->compress = compress;
return &sink->base;
#endif
@@ -99,16 +90,36 @@ bbsink_zstd_begin_backup(bbsink *sink)
bbsink_zstd *mysink = (bbsink_zstd *) sink;
size_t output_buffer_bound;
size_t ret;
+ bc_specification *compress = mysink->compress;
mysink->cctx = ZSTD_createCCtx();
if (!mysink->cctx)
elog(ERROR, "could not create zstd compression context");
- ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
- mysink->compresslevel);
- if (ZSTD_isError(ret))
- elog(ERROR, "could not set zstd compression level to %d: %s",
- mysink->compresslevel, ZSTD_getErrorName(ret));
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+ {
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+ compress->level);
+ if (ZSTD_isError(ret))
+ elog(ERROR, "could not set zstd compression level to %d: %s",
+ compress->level, ZSTD_getErrorName(ret));
+ }
+
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+ {
+ /*
+ * On older versions of libzstd, this option does not exist, and trying
+ * to set it will fail. Similarly for newer versions if they are
+ * compiled without threading support.
+ */
+ ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret)));
+ }
/*
* We need our own buffer, because we're going to pass different data to
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 7946b6350b6..f94c5c041d3 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,7 +67,6 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
#ifdef USE_ZSTD
bbstreamer_zstd_frame *streamer;
- int compresslevel;
size_t ret;
Assert(next != NULL);
@@ -88,18 +87,35 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
exit(1);
}
- /* Initialize stream compression preferences */
- if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
- compresslevel = 0;
- else
- compresslevel = compress->level;
- ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
- compresslevel);
- if (ZSTD_isError(ret))
+ /* Set compression level, if specified */
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
{
- pg_log_error("could not set zstd compression level to %d: %s",
- compresslevel, ZSTD_getErrorName(ret));
- exit(1);
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+ compress->level);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set zstd compression level to %d: %s",
+ compress->level, ZSTD_getErrorName(ret));
+ exit(1);
+ }
+ }
+
+ /* Set # of workers, if specified */
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+ {
+ /*
+ * On older versions of libzstd, this option does not exist, and
+ * trying to set it will fail. Similarly for newer versions if they
+ * are compiled without threading support.
+ */
+ ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+ compress->workers);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression worker count to %d: %s",
+ compress->workers, ZSTD_getErrorName(ret));
+ exit(1);
+ }
}
/* Initialize the ZSTD output buffer. */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 47f3d00ac45..5ba84c22509 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -130,6 +130,11 @@ my @compression_failure_tests = (
'invalid compression specification: found empty string where a compression option was expected',
'failure on extra, empty compression option'
],
+ [
+ 'gzip:workers=3',
+ 'invalid compression specification: compression algorithm "gzip" does not accept a worker count',
+ 'failure on worker count for gzip'
+ ],
);
for my $cft (@compression_failure_tests)
{
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 41a5b370cc5..d6f11b95535 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -34,6 +34,12 @@ my @test_configuration = (
'compression_method' => 'zstd',
'backup_flags' => ['--compress', 'server-zstd:5'],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'server-zstd:workers=3'],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -55,8 +61,27 @@ for my $tc (@test_configuration)
my @verify = ('pg_verifybackup', '-e', $backup_path);
# A backup with a valid compression method should work.
- $primary->command_ok(\@backup,
- "backup done, compression method \"$method\"");
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 2;
+ }
+ else
+ {
+ ok($backup_result, "backup done, compression $method");
+ }
# Make sure that it verifies OK.
$primary->command_ok(\@verify,
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 488a6d1edee..c1cd12cb065 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -49,6 +49,15 @@ my @test_configuration = (
'decompress_program' => $ENV{'ZSTD'},
'decompress_flags' => [ '-d' ],
'enabled' => check_pg_config("#define USE_ZSTD 1")
+ },
+ {
+ 'compression_method' => 'parallel zstd',
+ 'backup_flags' => ['--compress', 'client-zstd:workers=3'],
+ 'backup_archive' => 'base.tar.zst',
+ 'decompress_program' => $ENV{'ZSTD'},
+ 'decompress_flags' => [ '-d' ],
+ 'enabled' => check_pg_config("#define USE_ZSTD 1"),
+ 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
}
);
@@ -69,9 +78,27 @@ for my $tc (@test_configuration)
'pg_basebackup', '-D', $backup_path,
'-Xfetch', '--no-sync', '-cfast', '-Ft');
push @backup, @{$tc->{'backup_flags'}};
- $primary->command_ok(\@backup,
- "client side backup, compression $method");
-
+ my $backup_stdout = '';
+ my $backup_stderr = '';
+ my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+ '2>', \$backup_stderr);
+ if ($backup_stdout ne '')
+ {
+ print "# standard output was:\n$backup_stdout";
+ }
+ if ($backup_stderr ne '')
+ {
+ print "# standard error was:\n$backup_stderr";
+ }
+ if (! $backup_result && $tc->{'possibly_unsupported'} &&
+ $backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+ {
+ skip "compression with $method not supported by this build", 3;
+ }
+ else
+ {
+ ok($backup_result, "client side backup, compression $method");
+ }
# Verify that the we got the files we expected.
my $backup_files = join(',',
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c44..969e08cca20 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->level = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
}
+ else if (strcmp(keyword, "workers") == 0)
+ {
+ result->workers = expect_integer_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -266,5 +271,16 @@ validate_bc_specification(bc_specification *spec)
min_level, max_level);
}
+ /*
+ * Of the compression algorithms that we currently support, only zstd
+ * allows parallel workers.
+ */
+ if ((spec->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0 &&
+ (spec->algorithm != BACKUP_COMPRESSION_ZSTD))
+ {
+ return psprintf(_("compression algorithm \"%s\" does not accept a worker count"),
+ get_bc_algorithm_name(spec->algorithm));
+ }
+
return NULL;
}
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 0565cbc657d..6a0ecaa99c9 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -23,12 +23,14 @@ typedef enum bc_algorithm
} bc_algorithm;
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
+#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
typedef struct bc_specification
{
bc_algorithm algorithm;
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
+ int workers;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index bee6aacf47c..b6e33516110 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2502,8 +2502,7 @@ sub run_log
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::run_log(@_);
- return;
+ return PostgreSQL::Test::Utils::run_log(@_);
}
=pod
--
2.17.1
0002-Avoid-crash-on-backup-connection-strings-with-flags-.patchtext/x-diff; charset=us-asciiDownload
From a0c100c4473863335dc54ffc6167669cdc858096 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Mon, 28 Mar 2022 15:16:50 -0500
Subject: [PATCH 2/4] Avoid crash on backup connection strings with flags with
no value
---
src/common/backup_compression.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 969e08cca20..477dc7eb49b 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -192,7 +192,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
pfree(value);
/* If we got an error or have reached the end of the string, stop. */
- if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ if (result->parse_error != NULL)
+ break;
+ if (*kwend == '\0')
+ break;
+ if (vend != NULL && *vend == '\0')
break;
/* Advance to next entry and loop around. */
--
2.17.1
0003-basebackup-support-Z-zstd-long.patchtext/x-diff; charset=us-asciiDownload
From f0e5ee4d78dce6bc4d111b8b574c6b75f546ee4a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 27 Mar 2022 11:55:01 -0500
Subject: [PATCH 3/4] basebackup: support -Z zstd:long
---
doc/src/sgml/protocol.sgml | 10 +++++-
doc/src/sgml/ref/pg_basebackup.sgml | 4 +--
src/backend/replication/basebackup_zstd.c | 12 +++++++
src/bin/pg_basebackup/bbstreamer_zstd.c | 13 +++++++
src/common/backup_compression.c | 44 +++++++++++++++++++++++
src/include/common/backup_compression.h | 2 ++
6 files changed, 82 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 98f0bc3cc34..80f1a1f9a04 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2740,7 +2740,8 @@ The commands accepted in replication mode are:
level. Otherwise, it should be a comma-separated list of items,
each of the form <literal>keyword</literal> or
<literal>keyword=value</literal>. Currently, the supported keywords
- are <literal>level</literal> and <literal>workers</literal>.
+ are <literal>level</literal>, <literal>long</literal>, and
+ <literal>workers</literal>.
</para>
<para>
@@ -2751,6 +2752,13 @@ The commands accepted in replication mode are:
between 1 and 22.
</para>
+ <para>
+ The <literal>long</literal> keyword enables long-distance matching
+ mode, for improved compression ratio, at the expense of higher memory
+ use. Long-distance mode is supported only for
+ <literal>zstd</literal>.
+ </para>
+
<para>
The <literal>workers</literal> keyword sets the number of threads
that should be used for parallel compression. Parallel compression
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 82f5f606250..014c454bfab 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
integer, it specifies the compression level. Otherwise, it should be
a comma-separated list of items, each of the form
<literal>keyword</literal> or <literal>keyword=value</literal>.
- Currently, the supported keywords are <literal>level</literal>
- and <literal>workers</literal>.
+ Currently, the supported keywords are <literal>level</literal>,
+ <literal>long</literal>, and <literal>workers</literal>.
</para>
<para>
If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index f6876f48118..dc23898f7fd 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -121,6 +121,18 @@ bbsink_zstd_begin_backup(bbsink *sink)
compress->workers, ZSTD_getErrorName(ret)));
}
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_ZSTD_LONG) != 0)
+ {
+ ret = ZSTD_CCtx_setParameter(mysink->cctx,
+ ZSTD_c_enableLongDistanceMatching,
+ compress->zstd_long);
+ if (ZSTD_isError(ret))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("could not set compression flag for %s: %s",
+ "long", ZSTD_getErrorName(ret)));
+ }
+
/*
* We need our own buffer, because we're going to pass different data to
* the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index f94c5c041d3..051b97458ba 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -118,6 +118,19 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
}
}
+ if ((compress->options & BACKUP_COMPRESSION_OPTION_ZSTD_LONG) != 0)
+ {
+ ret = ZSTD_CCtx_setParameter(streamer->cctx,
+ ZSTD_c_enableLongDistanceMatching,
+ compress->zstd_long);
+ if (ZSTD_isError(ret))
+ {
+ pg_log_error("could not set compression flag for %s: %s",
+ "long", ZSTD_getErrorName(ret));
+ exit(1);
+ }
+ }
+
/* Initialize the ZSTD output buffer. */
streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 477dc7eb49b..9fc865ff299 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -31,6 +31,8 @@
static int expect_integer_value(char *keyword, char *value,
bc_specification *result);
+static bool expect_boolean_value(char *keyword, char *value,
+ bc_specification *result);
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
@@ -182,6 +184,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
result->workers = expect_integer_value(keyword, value, result);
result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
}
+ else if (strcmp(keyword, "long") == 0)
+ {
+ result->zstd_long = expect_boolean_value(keyword, value, result);
+ result->options |= BACKUP_COMPRESSION_OPTION_ZSTD_LONG;
+ }
else
result->parse_error =
psprintf(_("unknown compression option \"%s\""), keyword);
@@ -235,6 +242,43 @@ expect_integer_value(char *keyword, char *value, bc_specification *result)
return ivalue;
}
+/*
+ * Parse 'value' as an boolean and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1. The caller must check result->parse_error to determine if
+ * the call was successful.
+ *
+ * Valid values are: yes, no, on, off, 1, 0.
+ *
+ * Inspired by ParseVariableBool().
+ */
+static bool
+expect_boolean_value(char *keyword, char *value, bc_specification *result)
+{
+ if (value == NULL)
+ return true;
+
+ if (pg_strcasecmp(value, "yes") == 0)
+ return true;
+ if (pg_strcasecmp(value, "on") == 0)
+ return true;
+ if (pg_strcasecmp(value, "1") == 0)
+ return true;
+
+ if (pg_strcasecmp(value, "no") == 0)
+ return false;
+ if (pg_strcasecmp(value, "off") == 0)
+ return false;
+ if (pg_strcasecmp(value, "0") == 0)
+ return false;
+
+ result->parse_error =
+ psprintf(_("value for compression option \"%s\" must be a boolean"),
+ keyword);
+ return false;
+}
+
/*
* Returns NULL if the compression specification string was syntactically
* valid and semantically sensible. Otherwise, returns an error message.
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 6a0ecaa99c9..a378631a8da 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -24,6 +24,7 @@ typedef enum bc_algorithm
#define BACKUP_COMPRESSION_OPTION_LEVEL (1 << 0)
#define BACKUP_COMPRESSION_OPTION_WORKERS (1 << 1)
+#define BACKUP_COMPRESSION_OPTION_ZSTD_LONG (1 << 2)
typedef struct bc_specification
{
@@ -31,6 +32,7 @@ typedef struct bc_specification
unsigned options; /* OR of BACKUP_COMPRESSION_OPTION constants */
int level;
int workers;
+ int zstd_long;
char *parse_error; /* NULL if parsing was OK, else message */
} bc_specification;
--
2.17.1
0004-pg_basebackup-support-Zstd-negative-compression-leve.patchtext/x-diff; charset=us-asciiDownload
From bc6846ed93af475b079c4ab9bfa2a33c49a8a185 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH 4/4] pg_basebackup: support Zstd negative compression levels
"higher than maximum" is bogus
TODO: each compression methods should enforce its own levels
---
src/bin/pg_basebackup/bbstreamer_zstd.c | 1 +
src/common/backup_compression.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 051b97458ba..491d6106cf5 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -114,6 +114,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
{
pg_log_error("could not set compression worker count to %d: %s",
compress->workers, ZSTD_getErrorName(ret));
+
exit(1);
}
}
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 9fc865ff299..dbaf008af8e 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -308,13 +308,17 @@ validate_bc_specification(bc_specification *spec)
else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
max_level = 12;
else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+ {
max_level = 22;
+ /* The minimum level depends on the version.. */
+ min_level = -7;
+ }
else
return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
get_bc_algorithm_name(spec->algorithm));
if (spec->level < min_level || spec->level > max_level)
- return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+ return psprintf(_("compression algorithm \"%s\" expects a nonzero compression level between %d and %d"),
get_bc_algorithm_name(spec->algorithm),
min_level, max_level);
}
--
2.17.1
On Mon, Mar 28, 2022 at 4:53 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
I suggest to write it differently, as in 0002.
That doesn't seem better to me. What's the argument for it?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Mar 28, 2022 at 05:39:31PM -0400, Robert Haas wrote:
On Mon, Mar 28, 2022 at 4:53 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
I suggest to write it differently, as in 0002.
That doesn't seem better to me. What's the argument for it?
I find this much easier to understand:
/* If we got an error or have reached the end of the string, stop. */
- if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ if (result->parse_error != NULL)
+ break;
+ if (*kwend == '\0')
+ break;
+ if (vend != NULL && *vend == '\0')
break;
than
/* If we got an error or have reached the end of the string, stop. */
- if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+ if (result->parse_error != NULL ||
+ (vend == NULL ? *kwend == '\0' : *vend == '\0'))
Also, why wouldn't *kwend be checked in any case ?
Justin Pryzby <pryzby@telsasoft.com> writes:
Also, why wouldn't *kwend be checked in any case ?
I suspect Robert wrote it that way intentionally --- but if so,
I agree it could do with more than zero commentary.
regards, tom lane
On Mon, Mar 28, 2022 at 8:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I suspect Robert wrote it that way intentionally --- but if so,
I agree it could do with more than zero commentary.
Well, the point is, we stop advancing kwend when we get to the end of
the keyword, and *vend when we get to the end of the value. If there's
a value, the end of the keyword can't have been the end of the string,
but the end of the value might have been. If there's no value, the end
of the keyword could be the end of the string.
Maybe if I just put that last sentence into the comment it's clear enough?
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.
run_log() is far from the only such method in PostgreSQL::Test::Cluster.
Here's a patch that gives the same treatment to all the methods that
just pass through to the corresponding PostgreSQL::Test::Utils function.
Also attached is a fix a typo in the _get_env doc comment that I noticed
while auditing the return values.
- ilmari
Attachments:
0001-Make-more-PostgreSQL-Test-Cluster-methods-return-a-u.patchtext/x-diff; charset=utf-8Download
From 2e6ccdb2148128357e26816776a448a0ef95a1c6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Wed, 30 Mar 2022 02:56:51 +0100
Subject: [PATCH] Make more PostgreSQL:Test::Cluster methods return a useful
value
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit ad4f2c47de440cdd5d58cf9ffea09afa0da04d6c made run_log() return
the value of the corresponding PostgreSQL::Test::Utils function, but
missed out a lot of other ones. This makes all the methods that call
a corresponding function in ::Utils pass on the underlying function's
return value so they too can be used in the idiomatic fashion of
$node->some_test(â¦) or diag(â¦);
---
src/test/perl/PostgreSQL/Test/Cluster.pm | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b6e3351611..c56a7e6c3b 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2376,8 +2376,7 @@ sub command_ok
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::command_ok(@_);
- return;
+ return PostgreSQL::Test::Utils::command_ok(@_);
}
=pod
@@ -2396,8 +2395,7 @@ sub command_fails
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::command_fails(@_);
- return;
+ return PostgreSQL::Test::Utils::command_fails(@_);
}
=pod
@@ -2416,8 +2414,7 @@ sub command_like
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::command_like(@_);
- return;
+ return PostgreSQL::Test::Utils::command_like(@_);
}
=pod
@@ -2436,8 +2433,7 @@ sub command_fails_like
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::command_fails_like(@_);
- return;
+ return PostgreSQL::Test::Utils::command_fails_like(@_);
}
=pod
@@ -2457,8 +2453,7 @@ sub command_checks_all
local %ENV = $self->_get_env();
- PostgreSQL::Test::Utils::command_checks_all(@_);
- return;
+ return PostgreSQL::Test::Utils::command_checks_all(@_);
}
=pod
@@ -2483,8 +2478,7 @@ sub issues_sql_like
my $result = PostgreSQL::Test::Utils::run_log($cmd);
ok($result, "@$cmd exit code 0");
my $log = PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
- like($log, $expected_sql, "$test_name: SQL found in server log");
- return;
+ return like($log, $expected_sql, "$test_name: SQL found in server log");
}
=pod
--
2.30.2
0002-Fix-typo-in-PostgreSQL-Test-Cluster-_get_env-docs.patchtext/x-diffDownload
From 24423ca6a9cc69adb6d0a08554d94dac25db6d27 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Wed, 30 Mar 2022 12:58:25 +0100
Subject: [PATCH 2/2] Fix typo in PostgreSQL::Test::Cluster::_get_env docs
It had the wrong opening brackend on the method call.
---
src/test/perl/PostgreSQL/Test/Cluster.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index c56a7e6c3b..b98bff278a 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1368,7 +1368,7 @@ sub _set_pg_version
#
# Routines that call Postgres binaries need to call this routine like this:
#
-# local %ENV = $self->_get_env{[%extra_settings]);
+# local %ENV = $self->_get_env([%extra_settings]);
#
# A copy of the environment is taken and node's host and port settings are
# added as PGHOST and PGPORT, then the extra settings (if any) are applied.
--
2.30.2
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 28, 2022 at 12:52 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:True, but that also means it shows up in the actual failure message,
which seems too verbose. By just using 'print', it ends up in the log
file if it's needed, but not anywhere else. Maybe there's a better way
to do this, but I don't think using note() is what I want.That is the difference between note() and diag(): note() prints to
stdout so is not visible under a non-verbose prove run, while diag()
prints to stderr so it's always visible.OK, but print doesn't do either of those things. The output only shows
up in the log file, even with --verbose. Here's an example of what the
log file looks like:# Running: pg_verifybackup -n -m
/Users/rhaas/pgsql/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup/backup_manifest
-e /Users/rhaas/pgsql/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/extracted-backup
backup successfully verified
ok 6 - verify backup, compression gzipAs you can see, there is a line here that does not begin with #. That
line is the standard output of a command that was run by the test
script.
Oh, that must be some non-standard output handling that our test setup
does. Plain `prove` shows everything on stdout and stderr in verbose
mode, and only stderr in non-vebose mode:
$ cat verbosity.t
use strict;
use warnings;
use Test::More;
pass "pass";
diag "diag";
note "note";
print "print\n";
system qw(echo system);
done_testing;
$ prove verbosity.t
verbosity.t .. 1/? # diag
verbosity.t .. ok
All tests successful.
Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.04 cusr 0.01 csys = 0.07 CPU)
Result: PASS
$ prove -v verbosity.t
verbosity.t ..
ok 1 - pass
# diag
# note
print
system
1..1
ok
All tests successful.
Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.06 cusr 0.00 csys = 0.08 CPU)
Result: PASS
- ilmari
On 3/30/22 08:06, Dagfinn Ilmari Mannsåker wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Mar 28, 2022 at 12:52 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:True, but that also means it shows up in the actual failure message,
which seems too verbose. By just using 'print', it ends up in the log
file if it's needed, but not anywhere else. Maybe there's a better way
to do this, but I don't think using note() is what I want.That is the difference between note() and diag(): note() prints to
stdout so is not visible under a non-verbose prove run, while diag()
prints to stderr so it's always visible.OK, but print doesn't do either of those things. The output only shows
up in the log file, even with --verbose. Here's an example of what the
log file looks like:# Running: pg_verifybackup -n -m
/Users/rhaas/pgsql/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup/backup_manifest
-e /Users/rhaas/pgsql/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/extracted-backup
backup successfully verified
ok 6 - verify backup, compression gzipAs you can see, there is a line here that does not begin with #. That
line is the standard output of a command that was run by the test
script.Oh, that must be some non-standard output handling that our test setup
does. Plain `prove` shows everything on stdout and stderr in verbose
mode, and only stderr in non-vebose mode:
Yes, PostgreSQL::Test::Utils hijacks STDOUT and STDERR (see the INIT
block).
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Wed, Mar 30, 2022 at 8:00 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.run_log() is far from the only such method in PostgreSQL::Test::Cluster.
Here's a patch that gives the same treatment to all the methods that
just pass through to the corresponding PostgreSQL::Test::Utils function.Also attached is a fix a typo in the _get_env doc comment that I noticed
while auditing the return values.
I suggest posting these patches on a new thread with a subject line
that matches what the patches do, and adding it to the next
CommitFest. It seems like a reasonable thing to do on first glance,
but I wouldn't want to commit it without going through and figuring
out whether there's any risk of anything breaking, and it doesn't seem
like there's a strong need to do it in v15 rather than v16.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Mar 30, 2022 at 8:00 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:Robert Haas <robertmhaas@gmail.com> writes:
This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.run_log() is far from the only such method in PostgreSQL::Test::Cluster.
Here's a patch that gives the same treatment to all the methods that
just pass through to the corresponding PostgreSQL::Test::Utils function.Also attached is a fix a typo in the _get_env doc comment that I noticed
while auditing the return values.I suggest posting these patches on a new thread with a subject line
that matches what the patches do, and adding it to the next
CommitFest.
Will do.
It seems like a reasonable thing to do on first glance, but I wouldn't
want to commit it without going through and figuring out whether
there's any risk of anything breaking, and it doesn't seem like
there's a strong need to do it in v15 rather than v16.
Given that the methods don't currently have a useful return value (undef
or the empty list, depending on context), I don't expect anything to be
relying on it (and it passed check-world with --enable-tap-tests and all
the --with-foo flags I could easily get to work), but I can grep the
code as well to be extra sure.
- ilmari
On Tue, Mar 29, 2022 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 28, 2022 at 8:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I suspect Robert wrote it that way intentionally --- but if so,
I agree it could do with more than zero commentary.Well, the point is, we stop advancing kwend when we get to the end of
the keyword, and *vend when we get to the end of the value. If there's
a value, the end of the keyword can't have been the end of the string,
but the end of the value might have been. If there's no value, the end
of the keyword could be the end of the string.Maybe if I just put that last sentence into the comment it's clear enough?
Done that way, since I thought it was better to fix the bug than wait
for more feedback on the wording. We can still adjust the wording, or
the coding, if it's not clear enough.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Maybe if I just put that last sentence into the comment it's clear enough?
Done that way, since I thought it was better to fix the bug than wait
for more feedback on the wording. We can still adjust the wording, or
the coding, if it's not clear enough.
FWIW, I thought that explanation was fine, but I was deferring to
Justin who was the one who thought things were unclear.
regards, tom lane
On Wed, Mar 30, 2022 at 04:14:47PM -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Maybe if I just put that last sentence into the comment it's clear enough?
Done that way, since I thought it was better to fix the bug than wait
for more feedback on the wording. We can still adjust the wording, or
the coding, if it's not clear enough.FWIW, I thought that explanation was fine, but I was deferring to
Justin who was the one who thought things were unclear.
I still think it's unnecessarily confusing to nest "if" and "?:" conditionals
in one statement, instead of 2 or 3 separate "if"s, or "||"s.
But it's also not worth fussing over any more.
On 3/30/22 08:00, Dagfinn Ilmari Mannsåker wrote:
Robert Haas <robertmhaas@gmail.com> writes:
This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.run_log() is far from the only such method in PostgreSQL::Test::Cluster.
Here's a patch that gives the same treatment to all the methods that
just pass through to the corresponding PostgreSQL::Test::Utils function.Also attached is a fix a typo in the _get_env doc comment that I noticed
while auditing the return values.
None of these routines in Utils.pm returns a useful value (unlike
run_log()). Typically we don't return the value of Test::More routines.
So -1 on patch 1. I will fix the typo.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Thu, Mar 23, 2023 at 2:50 PM Thomas Munro <thomas.munro@gmail.com> wrote:
In rem: commit 3500ccc3,
for X in ` grep -E '^[^*]+event_name = "'
src/backend/utils/activity/wait_event.c |
sed 's/^.* = "//;s/";$//;/unknown/d' `
do
if ! git grep "$X" doc/src/sgml/monitoring.sgml > /dev/null
then
echo "$X is not documented"
fi
doneBaseBackupSync is not documented
BaseBackupWrite is not documented
[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]
Import Notes
Reply to msg id not found: CA+hUKGJogRRr0Xf-ZmwVc4nTwZxTPXOWMLRNjnO-Aay4NV4veg@mail.gmail.com
On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
BaseBackupSync is not documented
BaseBackupWrite is not documented[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]
Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 23, 2023 at 4:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
BaseBackupSync is not documented
BaseBackupWrite is not documented[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.
Here's a patch for that, and a patch to add the missing error check
Peter noticed.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
0001-Add-missing-documentation-entries-for-new-base-backu.patchapplication/octet-stream; name=0001-Add-missing-documentation-entries-for-new-base-backu.patchDownload
From c2c6395c2c38eaedeac6a9c045bd4a8b791eb414 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 24 Mar 2023 10:37:33 -0400
Subject: [PATCH 1/2] Add missing documentation entries for new base backup
wait events.
Per complaint from Thomas Munro.
---
doc/src/sgml/monitoring.sgml | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 21e6ce2841..488b76c765 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1272,6 +1272,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>BaseBackupRead</literal></entry>
<entry>Waiting for base backup to read from a file.</entry>
</row>
+ <row>
+ <entry><literal>BaseBackupSync</literal></entry>
+ <entry>Waiting for data written by a base backup to reach durable storage.</entry>
+ </row>
+ <row>
+ <entry><literal>BaseBackupWrite</literal></entry>
+ <entry>Waiting for base backup to write to a file.</entry>
+ </row>
<row>
<entry><literal>BufFileRead</literal></entry>
<entry>Waiting for a read from a buffered file.</entry>
--
2.37.1 (Apple Git-137.1)
0002-basebackup_to_shell-Add-missing-error-check.patchapplication/octet-stream; name=0002-basebackup_to_shell-Add-missing-error-check.patchDownload
From 509985230a3da4b9c6047ad38094f061660c059a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 24 Mar 2023 10:44:03 -0400
Subject: [PATCH 2/2] basebackup_to_shell: Add missing error check.
Per complaint from Peter Eisentraut.
---
contrib/basebackup_to_shell/basebackup_to_shell.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/contrib/basebackup_to_shell/basebackup_to_shell.c b/contrib/basebackup_to_shell/basebackup_to_shell.c
index 29f5069d42..57ed587d48 100644
--- a/contrib/basebackup_to_shell/basebackup_to_shell.c
+++ b/contrib/basebackup_to_shell/basebackup_to_shell.c
@@ -263,6 +263,11 @@ shell_run_command(bbsink_shell *sink, const char *filename)
/* Run it. */
sink->pipe = OpenPipeStream(sink->current_command, PG_BINARY_W);
+ if (sink->pipe == NULL)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not execute command \"%s\": %m",
+ sink->current_command)));
}
/*
--
2.37.1 (Apple Git-137.1)
On Fri, Mar 24, 2023 at 10:46:37AM -0400, Robert Haas wrote:
On Thu, Mar 23, 2023 at 4:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
BaseBackupSync is not documented
BaseBackupWrite is not documented[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.
Here's a patch for that, and a patch to add the missing error check
Peter noticed.
I think these maybe got forgotten ?
On Wed, Apr 12, 2023 at 10:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
I think these maybe got forgotten ?
Committed.
--
Robert Haas
EDB: http://www.enterprisedb.com